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1  Overview 


Under  this  RADC  contract  the  University  of  Rochester  developed  and  dissc  ninated 
papers,  ideas,  algorithms,  analysis,  software,  applications,  and  implemep cations  for 
parallel  vision  applications  and  programming  environments  for  parallel  computer  vi¬ 
sion.  The  work  has  been  widely  reported  and  highly  influential.  The  investigators 
have  been  awarded  several  honors  Faculty  members  involved  have  received  several 
prestigious  honors,  including  an  IBM  Faculty  Development  Award  for  Michael  Scott 
and  an  ONR  Young  Investigator  Award  for  Tom  LeBlanc.  We  were  awarded  a  DARPA 
Parallel  Systems  postgraduate  fellowship.  We  have  won  several  Best  Paper  awards. 
From  1984  to  1989  the  department  produced  approximately  400  papers,  more  than 
half  of  which  are  in  refereed  conferences  and  journals.  There  have  been  14  com¬ 
pleted  Ph.D.  theses  directly  related  to  parallel  vision  and  the  related  programming 
environment,  and  approximately  ten  more  such  theses  are  in  progress. 

As  a  part  of  the  RADC  contract,  we  developed  a  heterogeneous  parallel  architec¬ 
ture  involving  pipelined  and  MIMD  parallelism,  and  integrated  it  with  a  high  perfor¬ 
mance  9  degree  of  freedom  robot  head.  The  hardware  of  the  laboratory  is  described 
in  the  next  section.  The  most  significant  environment  development  work  centered  on 
the  Butterfly  Parallel  Processor  and  the  MaxVideo  pipelined  parallel  image  proces¬ 
sor.  For  the  Butterfly,  the  Psyche  multi-model  operating  system  was  developed  (as 
well  as  two  other  experimental  operating  systems),  and  the  Lynx  language  compiler 
ported.  Much  basic  and  influential  performance  monitoring  and  debugging  work  was 
completed,  resulting  in  working  systems  and  novel  algorithms.  There  was  also  signif¬ 
icant  research  in  systems  and  applications  using  the  other  parallel  architecture  in  the 
laboratory,  the  MaxVideo  parallel  pipelined  image  processor. 

Early  in  the  contract  period,  Rochester  demonstrated  SIMD-like  programs  on  the 
BBN  Butterfly  Parallel  Processor  that  show  linear  parallel  speedup.  Many  appli¬ 
cations  for  the  image  processing  pipeline  (including  tracking,  color  histogramming, 
feature  detection,  frame-rate  depth  maps,  frame-rate  time-to-collision  maps,  large- 
scale  correlations,  segmentation  using  motion  blur,  and  others)  have  been  written. 
The  efficacy  of  intimate  cooperation  between  vision  computations  and  controlled  mo¬ 
tion  has  been  demonstrated.  This  work  has  attracted  national  attention  and  won 
international  prizes.  The  Zebra  object-oriented  system  for  Datacube  programming 
was  developed,  and  the  Zed  menu  editor  built  on  top  of  Zebra.  These  programming 
environments  are  useful  for  any  register-level  devices,  and  are  a  considerable  improve¬ 
ment  on  previous  Datacube  environments.  They  are  being  made  available  to  all  by 
anonymous  ftp. 

Programming  MIMD  applications  is  difficult,  and  Rochester  is  a  leader  in  devel¬ 
oping  operating  systems  (PSYCHE),  performance  monitoring  (PPUTTS)  and  debug¬ 
ging  (INSTANT  REPLAY)  to  make  the  job  easier.  The  PLATINUM  system 
solves  automatically  many  <>f  the  problems  (code  and  data  replication  and  cacheing) 
in  getting  SIMD-like  programs  to  run  efficiently  on  Non  Uniform  Memory  Access 
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architectures  (such  as  hypercubes,  Butterfly,  Encore,  etc.).  The  MIMD  program  de¬ 
velopment  tools  (PPUTTS,  Instant  Replay,  and  Moviola)  provide  several  graphical 
views  and  a  LISP  interface  to  a  multi-process,  multi-processor  application.  The  sys¬ 
tem  provides  repeatable  single-stepping,  statistics,  symbolic  debugging,  and  other 
“traditional”  debugging  techniques  that  have  not  previously  been  available  to  par¬ 
allel  programmers.  This  work  has  produced  many  influential  papers,  severed  prizes, 
and  the  operational  systems. 

At  the  end  of  the  contract  period  the  PSYCHE  operating  system  was  operational, 
and  is  currently  supporting  multi-agent  applications,  and  multi-model  (e.g.  both 
threads  and  heavyweight  processes)  programming  environments.  PSYCHE  has  been 
used  to  support  five  independent  processes  controlling  the  bouncing  of  a  tethered  bal¬ 
loon  with  a  paddle  -  this  hybrid  system  uses  pipelined  parallelism  from  the  MaxVideo 
system  for  low  level  visual  input.  As  a  result  of  the  RADC  contract,  we  are  now  de¬ 
veloping  plans  (the  ARMTRAK  system)  for  integrating  pipelined  parallelism,  MIMD 
parallelism  with  multiple  computational  models  and  sequential  planning  paradigms 
to  manage  a  dynamic  model  railroad  system. 

Rochester  has  implemented  object  recognition  algorithms  in  neural  nets,  and  de¬ 
veloped  hardware  realizations  for  the  resulting  constraint-propagation  networks.  The 
domain  includes  large  sets  of  objects,  and  uses  Bayesian  techniques  to  handle  par¬ 
tial  and  incomplete  information.  The  Rochester  Connectionist  Simulator  and  the 
Zebra/Zed  systems  are  available  by  anonymous  ftp.  Together  they  have  been  dis¬ 
tributed  to  several  hundred  sites  worldwide. 

This  final  report  starts  with  a  quick  guide  to  key  papers  that  have  been  produced 
over  the  years,  and  then  in  turn  briefly  outlines  the  Laboratory,  parallel  computer 
vision  applications,  integration  of  a  cognitive  layer  into  the  system,  support  work 
in  operating  systems,  languages,  utilities,  performance  monitoring,  pipelined  paral¬ 
lelism,  and  technology  transfer  issues.  A  list  of  theses  produced  under  the  contract 
is  included.  More  detail  is  available  from  the  papers  in  the  literature,  and  extensive 
references  are  provided. 


2  Key  Reports  by  Topic 

This  section  briefly  points  out  key  reports.  More  detail  on  these  projects  appears  in 
later  sections  of  this  final  report. 

2.1  Laboratory  for  Parallel  Vision  Research 

During  the  contract  period,  Rochester  developed  and  commissioned  a  binocular  robot 
head,  acquired  and  commissioned  a  multiple  degree-of-freedom  platform  for  the  3-dof 
robot  head,  and  acquired  a  real-time,  pipelined  parallel  image  processing  engine. 
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The  laboratory  allows  us  to  test  our  systems  concepts  in  a  complex,  visuo-motor 
real-time  environment.  Software  integration  is  important  as  well:  PSYCHE’s  first 
application  will  be  to  manage  the  higher-level  data  structures  (e.g.  the  world  model) 
in  an  integrated  parallel  vision  system  that  also  uses  the  pipelined  parallelism  of  the 
frame-rate  MaxVideo  image  processing  system.  The  key  reports  are  [Brown  et  al. 
1988  (Rochester  Robot);  Ballard  1990  (Animate  Vision);  Ballard  et  al.  1987  (Eye 
Movements);  Brown  and  Rimey  1988  (Coordinate  systems,  kinematics...);  Brown 
1988  (Parallel  Vision  with  the  Butterfly);  Brown  1989a  (Gaze  Control)]. 

2.2  Vision  Applications 

Vision  applications  are  an  important  part  of  our  work,  but  are  only  indirectly  sup¬ 
ported  by  the  contract,  which  views  applications  as  potential  users  of  the  parallel 
systems  we  are  developing.  For  example,  Paul  Chou’s  work  used  the  Markov  Ran¬ 
dom  Field  formulation  for  intermediate-level  vision  and  produced  results  that  have 
been  quantified  and  are  better  than  any  other  known  techniques.  We  have  ported 
his  evidence-combination  to  the  Butterfly,  where  it  runs  as  a  set  of  three  cooperating 
agents  under  Tom  LeBlanc’s  SMP  system.  As  another  example,  the  work  of  Cooper 
and  Swain  is  being  ported  to  the  Connection  Machine  at  the  University  of  Syracuse’s 
Parallel  Computing  Facility,  NPAC.  Object  recognition,  inference,  quantification  of 
performance  in  biologically  oriented  neural  net  computational  techniques,  and  hard¬ 
ware  for  relaxation  computations  have  all  been  under  active  study. 

Several  parallel  vision  applications  were  pursued,  including  Butterfly  program¬ 
ming,  Markov  Random  Field  and  connectionist  research,  and  work  aimed  at  inte¬ 
grating  the  real-time  laboratory  and  using  it  for  complex  planning  tasks  that  in¬ 
clude  sensing  and  acting.  Key  papers  are  [Feldman  et  al.  1988a, b;  Feldman  1987 
(Basic  connectionism);  Simard  et  al.  1988  (Recurrent  backpropagation);  Porat  and 
Feldman  1988  (Learning  theory);  Olson  et  al.  1987  (Vision  on  butterfly);  Ballard 
and  Ozcandarli  1988  (Kinetic  depth  calculations);  Brown  et  al.  1989a  (decentral¬ 
ized  Kalman  filters);  Aloimonos  and  Brown  1988  (Robust  computation  of  intrinsic 
images);  Chou  and  Brown  1988  (Sensor  fusion,  reconstruction  and  labeling);  Wix- 
son  and  Ballard  1990  (Color  histograms);  Rimey  and  Brown  1990  (Hidden  Markov 
models);  Yamauchi  1989  (Juggler);  Nelson  1990  (Flow  fields);  Cooper  1988  (Struc¬ 
ture  recognition);  Sher  1987a, b,c  (Probabilistic  low-level  vision);  Swain  1988  (Object 
recognition  from  large  database);  Swain  and  Cooper  1988  (Parallel  hardware  for  recog¬ 
nition);  Martin,  Brown,  and  Allen  1990  (ARMTRAK  project);  Allen  and  Hayes  1985 
(Theory  of  time),  Allen  1989  (Representing  time)]. 

2.3  Parallel  Hardware  and  Programming  Languages 

Throughout  the  contract  period  Rochester  has  kept  pace  with  the  technical  develop¬ 
ments  of  the  Butterfly  product  line  of  BBN-ACI.  We  have  owned  three  generations 
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of  Butterfly  computers,  including  one  of  the  largest  ever  sold.  Much  of  our  research 
transcends  any  particular  piece  of  hardware,  though  its  implementation  of  course 
requires  intimate  familiarity  with  particular  hardware. 

Languages  for  MIMD  parallel  computers  have  been  developed  and  ported  under 
the  contract,  and  quantitative  comparisons  made  between  programming  models.  A 
library  for  programming  the  Max  Video  pipeline  parallel  image  analysis  hardware  has 
also  been  developed.  The  key  reports  are  [LeBlanc  et  al.  1988  (Large-scale  parallel 
programming);  Scott  et  al.  1990  (Multi-model  parallel  programming);  Crowl  1989  (A 
uniform  object  model);  Tilley  1989  (Zebra  for  MaxVideo)]. 

2.4  Parallel  Programming  Environment  —  Operating  Sys¬ 
tems 

Three  operating  systems  (Elmwood,  Platinum,  Psyche)  have  been  developed  for  the 
Butterfly.  The  most  ambitious  project  is  Psyche,  though  Platinum  solves  automati¬ 
cally  a  number  of  problems  that  users  face  when  using  Uniform  System-style  program¬ 
ming  on  a  MIMD  computer  (Automatic  cacheing  and  data  migration,  for  instance). 
The  key  papers  are  [Scott  et  al.  1989b, c  (Psyche  description);  LeBlanc  et  al.  1989b 
(Elmwood  description);  Cox  and  Fowler  1989  (Platinum  description)]. 


2.5  Parallel  Programming  Environment  -  Utilities  and  Li¬ 
braries 

Along  with  languages  and  operating  systems,  Rochester  produced  systems  utilities  for 
communication,  file  systems,  and  compilers.  They  span  a  broad  range  from  parallel 
file  systems  through  new  languages  for  expressing  parallel  computation.  Applications 
packages  such  as  the  current  version  of  the  neural  net  simulator  and  the  image- 
processing  utilities  allow  speedups  of  up  to  a  factor  of  100  over  single-workstation 
implementations.  User  interfaces  to  large  multiprocessor  computers  are  a  difficult 
issue  addressed  by  Yap’s  work,  and  many  of  the  packages  extend  the  range  of  com¬ 
putational  models  available  to  a  user.  For  instance,  the  Ant  Farm  project  provides 
capability  we  noticed  we  needed  after  the  first  DARPA  Parallel  Architectures  Bench¬ 
mark  and  Workshop,  namely  the  ability  to  support  many  lightweight  processes.  The 
key  papers  are  [  Scott  and  Jones  19S8  (Ant  Farm);  Dibble  and  Scott  1989a, b  (Bridge 
file  system);  Bolosky  et  al  1989  (memory  management  techniques);  Goddard  et  al. 
1989  (Connectionist  simulator);  LeBlanc  and  Jain  1987  (Crowd  control);  Yap  and 
Scott  1990  (PenGuin)]. 
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2.6  Parallel  Programming  Environment  -  Performance  Mon 
itoring 

Debugging  and  performance  monitoring  in  an  MIMD  environment  are  significantly 
more  difficult  than  on  a  uniprocessor.  Rochester  contributed  many  results  over  the 
course  of  the  contract.  The  instant  replay  system  allows  normal  cyclic  debugging 
in  a  nondeterministic  parallel  environment  by  keeping  a  log  of  interactions  between 
processes.  Moviola  is  a  suite  of  interactive  performance  monitoring  tools.  The  key 
papers  are  [LeBlanc  and  Mellor-Crummey  1987  (Instant  Replay);  Fowler  et  al.  1988, 
LeBlanc  et  al.  1990  (Moviola)]. 


3  The  Laboratory 

The  Rochester  Robotics  Laboratory  has  developed,  during  the  years  of  the  RADC 
contract,  to  the  configuration  described  in  this  section.  It  currently  consists  of  four 
key  components  (Fig.  1):  a  “head”  containing  cameras  for  visual  input,  a  robot 
arm  that  supports  and  moves  the  head,  a  special-purpose  parallel  processor  for  high- 
bandwidth,  low-level  vision  processing,  and  a  general-purpose  parallel  processor  for 
high-level  vision  and  planning.  This  unique  design  allows  for  visuo-motor  exploration 
over  an  800  cubic  foot  workspace,  while  also  providing  huge  computing  and  power 
resources.  Thus,  we  do  not  suffer  the  communication  and  power  limitations  of  most 
mobile  platforms. 

The  robot  head  (shown  in  Fig.  2)  built  as  a  joint  project  with  the  University’s 
Mechanical  Engineering  Department,  has  three  motors  and  two  CCD  high-resolution 
television  cameras  providing  input  to  a  MaxVideo  digitizer  and  pipelined  image- 
processing  system.  One  motor  controls  pitch  or  altitude  of  the  two-eye  platform, 
and  separate  motors  control  each  camera’s  yaw  or  azimuth,  providing  independent 
“vergence”  control.  The  motors  have  a  resolution  of  2,500  positions  per  revolution 
and  a  maximum  speed  of  400  degrees/second.  The  controllers  allow  sophisticated 
velocity  and  position  commands  and  data  read-back. 

The  robot  body  is  a  PUMA761  six  degree-of-freedom  arm  with  a  two  meter  radius 
workspace  and  a  top  speed  of  about  one  meter /second.  It  is  controlled  by  a  dedi¬ 
cated  LSI-11  computer  implementing  the  proprietary  VAL  execution  monitor  and 
programming  interface. 

The  MaxVideo  system  consists  of  several  independent  boards  that  can  be  cabled 
together  to  achieve  many  frame-rate  image  analysis  capabilities:  digitizing,  storage, 
and  transmission  of  images  and  sub-images,  8x8  or  larger  convolution,  pixel-wise 
image  processing,  cross-bar  image  pipeline  switching  for  dynamic  reconfiguration  of 
the  image  pipeline,  look-up  tables,  histogramming  and  feature  location.  A  digital 
signal  processing  computer  on  one  board  can  perform  arbitrary  computations,  and 
also  has  a  high  speed  image  bus  interface  and  a  VME  bus  master  interface  so  it  can 


8 


Figur>'  1:  Robotics  Laboratory  Hardware 
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Figure  2:  The  Rochester  Robot.  A  multi-exposure  photograph  of  the  “Rochester 
Robot”  in  action.  The  arm  is  the  largest  industrial  arm  on  the  market,  while  the 
unique  head  was  designed  by  Professor  Dana  Ballard. 
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program  the  other  boards  in  the  same  manner  as  the  host.  The  MaxVideo  boards  are 
all  register  programmable  and  are  controlled  by  the  Butterfly  or  Sun  via  VME  bus. 

A  unique  feature  of  our  laboratory,  one  crucial  for  our  future  research,  is  the  ca¬ 
pability  to  use  a  multiprocessor  as  the  central  computing  resource  and  host.  Our 
Butterfly  Plus  Parallel  Processor  contains  28  nodes,  each  consisting  of  an  MC68020 
processor,  MC68851  MMU,  MC68881  FPU,  and  4  MBytes  of  memory.  The  Butter¬ 
fly  is  a  shared-memory  multiprocessor  with  non-uniform  memory  access  times;  each 
processor  may  directly  access  any  memory  in  the  system,  but  with  approximately 
15  times  greater  latency.  The  Butterfly  has  a  VME  bus  connection  that  mounts  in 
the  same  card  cage  as  the  MaxVideo  and  motor  controller  boards.  Currently,  a  SUN 
workstation  acts  as  a  host  system  for  the  lab.  As  software  develops  on  the  Butterfly, 
we  plan  to  migrate  functionality  from  the  workstation  host  to  the  Butterfly. 

The  RADC  contract  supported  the  development  of  parallel  applications  algorithms 
and  the  development  of  software  for  the  two  parallel  computing  engines  in  this  labo¬ 
ratory,  the  Butterfly  and  the  MaxVideo. 


4  Parallel  Vision  Applications 

Although  the  focus  of  the  contract  was  on  developing  a  programming  environment, 
Rochester  also  did  parallel  vision  applications  as  a  test  and  a  driving  force  for  the 
systems  development.  This  section  briefly  outlines  some  of  the  more  influential  of  the 
projects:  more  details  are  available  in  the  literature  [e.g.  Brown  et  al.  1985;  1988]. 

4.1  SIMD-style  Low-level  Vision  on  the  Butterfly 

Rochester  participated  in  the  first  DARPA  benchmark  study.  One  aspect  of  that 
work  motivated  much  of  our  current  research  in  multi-model  parallel  programming 
environments  and  performance  modeling  tools.  The  other  aspect  was  a  successful 
demonstration  that  SIMD-style  (data-parallel)  low-level  vision  applications  could  be 
performed  on  an  MIMD  computer.  Fig.  3  shows  some  results  for  border-following. 
Extensive  analysis  and  demonstration  programs  for  multi-resolution  image  pyramid 
generation,  line-finding,  connected  component  analysis,  and  the  Hough  transform 
were  also  developed  [Brown  19S6;  Olson  1986b, c;  Olson  et  al  1987], 


4.2  Parallel  Object  Recognition 

Paul  Cooper  and  Michael  Swain  cooperated  to  investigate  object  recognition,  based 
on  object  relational  structure  and  some  geometry,  from  a  large  database.  This  work 
was  based  in  connectionist,  massively  parallel  framework,  and  led  to  hardware  (VLSI 
circuit)  designs  and  implement  at  ions  on  the  connection  machine  at  NPAC  in  Syracuse, 
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Marr.  Poggio.  May  hew. 


Figure  4:  Previous  work  on  intrinsic  image  calculation. 

NY  [Cooper  1988,  1989;  Cooper  and  Swain  1988,  1989;  Swain  and  Cooper  1988;  Swain 
1988], 


4.3  Cooperating  Intrinsic  Image  Calculations 

John  Aloimonos  took  a  mathematical  approach  in  his  thesis  to  unifying  several  dis¬ 
parate  results  on  extracting  physical  attributes  from  images  [Aloimonos  et  al.  1985, 
Aloimonos  1986;  Aloimonos  and  Brown  19841, b,  1988,  1989;  Aloimonos  and  Swain 
1985;  Brown  et  al.  1987,  etc.].  The  state  of  knowledge  when  he  started  is  shown  in 
Fig.  4. 

As  a  result  of  his  work,  mathematical  constraints  were  developed  to  allow  these 
calculations  to  be  combined  to  produce  more  robust  results  with  less  restrictive  as¬ 
sumptions.  This  work  is  reported  in  his  recent  book  Integration  of  Visual  Modules , 
written  with  Dave  Schulman,  and  summarized  in  Fig.  5. 

The  characteristics  of  well-known  visual  problems  are  radically  changed  by  this 
approach  (Fig.  6),  which  yields  robust,  linear  solutions  with  fewer  assumptions. 
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FIGURE  1.4. 


Figure  5:  Some  of  Aloimonos’  contributions  in  cooperating  intrinsic  image  calculation. 


Problem 

Passive  Observer 

Active  Observer 

Shape  from  shading 

IU- posed  problem.  Needs 
to  be  regular) xed.  Even 
then,  unique  solution  is  not 
guaranteed  because  of  non¬ 
linearity. 

Well-posed  problem. 

Unique  solution.  Linear 
equation  used.  Stability. 

Shape  from  contour 

Ill-pased  problem.  Has  not 
been  regularized  up  to  now 
in  the  Tichonov  sense. 
Solvable  under  restrictive 
assumptions. 

Well-posed  problem. 

Unique  solution  for  both 
monocular  or  binocular  ob¬ 
server. 

Shape  from  texture 

Ill-posed  problem.  Needs 
some  assumption  about  the 
texture. 

Well  posed  problem.  No 
assumption  required. 

Structure  from  motion 

Well  posed  but  unstable. 
Nonlinear  constraints. 

Well  posed  and  stable. 
Quadratic  constraints,  sim¬ 
ple  solution  methods,  sta¬ 
bility. 

Optic  flow  (area 
based) 

Ill- posed.  Needs  to  be  reg¬ 
ularized.  The  introduced 
smoothness  might  produce 
erroneous  results. 

Well  posed  problem. 

Unique  solution.  Might  be 
unstable. 

Figure  6:  Combining  constraints  gives  better  solutions  for  vision  problems 
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4.4  Markov  Random  Fields  and  Massively  Parallel  IU 

In  their  thesis  work,  Dave  Sher  and  Paul  Chou  pursued  a  probabilistic  approach  to 
image  understanding,  which  could  be  implemented  as  a  Markov  Random  Field  [Sher 
1987a, b,c;  Chou  1988,  Chou  and  Brown  1987a, b,  Chou  et  al.  1987;  etc.].  Image 
understanding  then  takes  the  form  of  labelling  individual  pixels  or  features  in  the 
image  with  properties  such  as  “boundary”,  “no  boundary”,  or  a  depth  value.  This 
approach  allows  for  a  uniform  and  real-time  evidence  combination  algorithm  for  multi¬ 
sensor  fusion,  and  a  parallelizable  algorithm  for  the  labelling.  Using  this  approach, 
the  reconstructionist  visual  approach  that  tries  to  create  depth  maps  from  images  is 
integrated  with  the  solution  to  the  segmentation  problem,  which  identifies  boundaries 
and  objects  within  the  scene.  Chou  developed  the  Highest  Confidence  First  algorithm 
for  labelling.  Chou  made  quantitative  comparison  between  several  known  Markov 
Random  Field  algorithms,  and  HCF  was  shown  to  be  a  superior  method  to  all  those 
known  at  the  time.  HCF  is  inherently  sequential.  Later  work  at  Rochester  by  Swain 
and  Wixson  parallelized  the  algorithm  for  the  Butterfly,  with  improved  qualitative, 
quantitative,  (and  of  course  timing)  results  [Swain  and  Wixson  1989,  Swain  et  al. 
1989]. 

Fig.  7  shows  the  performance  of  HCF  on  a  boundary-detection  task. 

Fig.  8  shows  the  results  of  combining  sparse  depth  measurements  with  intensity 
data  to  produce  a  depth  map  of  the  scene  and  a  boundary  map  simultaneously. 

4.5  Pipelined  Parallelism  and  Real-time  Object  Search 

A  good  example  of  the  cooperation  of  real-time  vision  processing  and  a  mobile  ob¬ 
server  is  provided  by  Rochester’s  program  of  work  on  fast  object  detection,  which 
uses  relational  modeling,  and  reasoning  about  occlusion. 

The  ability  to  find  a  certain  object  in  an  unknown  environment  is  a  component  of 
many  real-world  problems  that  a  general-purpose  robot  might  face.  Lambert  Wixson 
studied  this  visual  task,  object  search.  His  research  is  divided  into  three  areas,  all  of 
which  attack  the  key  problem  of  robustly  finding  the  object  in  the  smallest  possible 
time. 

The  first  is  the  problem  of  object  recognition.  Most  research  on  model-based 
object  recognition  from  a  single  camera  has  concentrated  on  robustness.  While  this 
is  obviously  an  important  first  step,  the  object  search  task  brutally  illustrates  that 
speed  is  just  as  important.  Almost  all  current  object  recognition  schemes  require 
that  image  features  be  matched  to  model  features,  requiring  a  time  polynomial  in 
the  number  of  features  to  perform  the  matching.  This  polynomial  time  is  a  result  of 
having  to  match  the  image  features  to  the  model  features  in  order  to  calculate  and 
refine  the  pose  estimate  of  the  object  in  the  scene.  By  adding  an  initial  stage  that 
does  not  perform  pose  calculation  but  rather  simply  detects  the  likely  presence  of 
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Figure  7:  Highest  Confidence  First  algorithm  and  edge-finding.  Boundary  detection 
experiment  set  (III). 
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Figure  8:  Highest  Confidence  First  algorithm,  segmentation,  and  depth  boundary 
calculation.  Experiments  with  stereo  disparity  data  (II). 
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the  object  in  the  image,  considerable  efficiency  can  be  gained.  The  idea  is  that  this 
initial  stage  would  be  used  to  rank  each  gaze  in  a  set  of  candidate  gazes  according  to 
the  likelihood  that  the  image  produced  by  the  gaze  contains  the  desired  object.  This 
ranking  can  then  be  used  to  choose  the  order  in  which  a  more  sophisticated  object 
recognition  program  (which  would  calculate  pose)  should  be  applied  to  the  candidate 
images. 

Wixson  (Wixson  and  Ballard  1989]  constructed  an  object  detection  scheme  that 
relies  on  the  assumption  that  the  color  histogram  of  an  object  can  be  used  as  an 
object  “signature”  which  is  invariant  over  a  wide  range  of  scenes  and  object  poses. 
The  color  histogram  is  computed  at  3Hz  by  the  Datacube  hardware,  and  the  matching 
compares  18  database  items  to  a  histogram  in  one  second.  Counting  time  to  move 
the  robot  to  a  new  gaze  position  (one  and  one-half  seconds  per  move),  each  gaze  can 
be  evaluated  for  its  object  content  in  just  under  3  seconds.  Fig.  9  shows  some  sample 
results. 

The  second  area  of  object  search  is  the  use  of  high-level  knowledge  of  common 
relationships  and  interactions  between  objects  (z.e.the  contexts  in  which  certain  ob¬ 
jects  typically  appear)  to  direct  the  search  process  [Wixson  to  appear].  For  example, 
if  the  robot  is  looking  for  a  pen,  it  might  be  wise  to  search  for  a  desk  first,  referred  to 
this  use  of  high-level  knowledge  as  indirect  search.  Our  approach  formulates  indirect 
search  using  a  finite  set  of  relationships  (FRONT-OF,  NEAR,  LEFT-OF,  etc.)  be¬ 
tween  objects.  The  relationships  may  be  known  apriori  or,  more  interestingly,  derived 
from  experience  with  the  scene.  Initially  objects  will  be  represented  as  a  (perhaps 
partial)  local  coordinate  system  (a  circularly  symmetric  object  might  only  have  a  Z 
axis  and  origin,  for  example)  and  a  feature  vector.  Characterizing  the  occurrence  of 
relationships  as  Bernoulli  trials  leads  to  a  confidence  interval  representation  of  the 
probability  of  the  relations  holding.  In  turn,  these  probabilities  can  be  used  in  a 
“highest  impact  first”  search  that  acquires  information  in  the  order  that  maximally 
decreases  expected  uncertainty.  The  result  is  to  derive  Garvey-like  strategies  on  the 
fly,  with  learning,  and  from  first  principles. 

The  third  area  of  object  search  involves  reasoning  about  obstacles  and  occlusion 
to  the  extent  that  they  affect  the  task  of  finding  the  desired  object.  This  research 
is  in  progress.  We  would  like  a  system  which  can  reason,  for  example,  that  since  it 
hasn’t  yet  seen  the  object,  but  the  area  under  the  desk  has  not  been  examined,  then 
this  area  should  be  examined.  Many  issues  are  present  in  this  problem.  The  largest  is 
the  choice  of  a  world  representation  which  can  support  this  reasoning  without  being 
computationally  problematic.  The  reasoning  and  world  modeling  must  also  be  robust 
to  sensor  noise  and  marginal  errors  in  the  depth  estimation  process  used  to  detect 
occlusions  in  the  scene. 

Wixson’s  work  assumed  a  solution  to  the  object  recognition  problem.  Mike  Swain 
investigated  color  cues  for  object  recognition  [Swain  19S8a,b].  Fig.  10  shows  19 
pairs  of  images  (the  originals  are  colored):  on  the  left  of  each  pair  is  a  catalog  entry, 


18 


Figure  9a 


Figure  9b 


Figure  9:  (a)  Top  view  of  th<-  laboratory  environment  for  a  typical  test  run  showing 
the  direction  (but  not  the  <:  of  each  object  with  respect  to  the  robot,  (b) 

Gaze  directions  produced  by  !!.••  ■>bj<,ct  search  mechanism  for  the  “Clorox”  and  “All” 
detergent  boxes.  Area  of  circle  is  proportj^nal  to  the  confidence  of  detection  in  that 
gaze.  Numbers  next  to  circles  reflect  the  ordering  of  the  confidences  in  decreasing 
order.  The  dashed  lines  in  each  circle  are  merely  to  provide  reference  points. 


Figure  10:  Black  and  white  reproduction  of  color  originals  of  (catalog,  instance)  image 
pairs. 

on  the  right  an  instance  from  a  real  scene.  11  shows  confusion  matrices  for  the  19 
image  instances  recognized  from  their  catalog  descriptions.  The  instance  views  have 
different  viewing  angles  from  those  that  generated  the  catalog.  The  basic  description 
is  a  color  histogram,  and  a  saliency  measure  subtracts  histogram  features  common 
to  the  ensemble,  thus  weighting  more  heavily  the  features  that  are  unique  to  each 
object. 

4.6  Gaze  Control 

In  research  carried  out  at  Oxford,  Chris  Brown  did  work  on  Kalman  filters  for  track¬ 
ing  applications  (reported  in  the  DARPA  IU  Proceedings ),  on  projectively  invariant 
matching  of  geometric  structures  in  images  (reported  in  the  European  Vision  Confer¬ 
ence),  and  on  control  of  Rochester’s  robot  head  [Brown  1989b, c;  1990a, b]. 

The  work  investigated  predictive  mechanisms  to  solve  problems  of  cooperation 
and  delay.  “Subsumption”  architectures  like  those  of  Brooks  and  Connell  find  these 
problems  troublesome  since  internal  state  representations  are  minimized,  control  in¬ 
teraction  is  usually  limited  to  preemption,  and  actions  are  synchronized  only  through 
the  outside  world. 
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Figure  11:  Color  recognition  confusion  matrices  for  pairs  in  previous  figure  (considered 
left  to  right  within  top  to  bottom.)  (a)  Without  saliency  weighting  on  features.  In 
this  case,  the  ranks  of  the  correct  choice  are  as  follows  (they  should  be  identically  1): 
2141121111111111111.  (b)  With  saliency  weighting,  the  correct  choices 
uniformly  rank  first. 
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The  work  developed  eight  camera  controls  and  investigated  their  interaction.  It 
showed  that  predictive  techniques  can  overcome  the  catastrophic  effects  of  delays  and 
interactions.  It  made  comparisons  with  primate  gaze  controls  and  with  an  open-loop 
approach  to  delay.  Tracking,  gaze  shifts,  and  vergence  controls  used  three  dimen¬ 
sional,  not  retinal,  coordinates.  Optimal  estimation  techniques  were  used  to  estimate 
and  predict  the  dynamic  properties  of  the  target. 

The  control  algorithms  are  run  in  a  simulation  that  is  meant  to  be  general  and 
flexible,  but  especially  to  capture  the  relevant  aspects  of  the  Rochester  Robot.  Previ¬ 
ous  work  with  the  Rochester  Robot  had  already  produced  several  implementations  of 
potential  basic  components  of  a  real-time  gaze-control  system  These  components  in¬ 
cluded  basic  capabilities  of  target  tracking,  rapid  gaze  shifts,  gaze  stabilization  against 
head  motion,  verging  the  cameras,  binocular  stereo,  optic  flow  and  kinetic  depth  cal¬ 
culations.  These  separate  capabilities  do  not  yet  cooperate  to  accomplish  tasks.  The 
work  at  Oxford  was  partly  motivated  by  the  need  to  integrate  several  capabilities 
smoothly  for  a  range  of  tasks  useful  for  perception,  navigation,  manipulation,  and  in 
general  “survival”. 

There  are  four  main  coordinate  systems  of  interest  in  this  work:  LAB,  HEAD, 
and  (left  and  right)  camera  and  retinal  (Fig.  12).  The  LAB,  HEAD,  and  camera 
systems  are  three-dimensional,  right-handed  and  orthogonal.  The  retinal  system  is 
two-dimensional  and  orthogonal.  LAB  is  rigidly  attached  to  the  environment  in  which 
the  animate  system  and  objects  move.  HEAD  is  rigidly  attached  to  the  head,  and 
(for  this  work)  has  three  rotational  and  three  translational  degrees  of  freedom.  The 
camera  systems  are  rigidly  attached  to  the  cameras  and  have  independent  pan  and 
a  shared  tilt  degree  of  freedom.  The  retinal  systems  represent  image  coordinates 
resulting  from  perspective  projection  of  the  visible  world.  The  cameras  are  supported 
on  a  kinematic  chain  so  that  their  principal  points  do  not  in  general  lie  on  any  head 
rotation,  pan,  or  tilt  axis. 

The  simulated  system  controls  are  summarized  in  Table  1.  Our  purpose  was  to 
investigate,  with  some  flexibility,  the  interactions  of  various  forms  of  basic  camera  and 
head  controls.  The  controls  are  not  meant  to  model  those  of  any  biological  system. 
Rather  the  goal  was  to  build  a  system  with  sufficient  functionality  to  exhibit  many 
control  interactions.  The  interaction  of  a  subset  of  these  controls  on  target  tracking 
and  acquisition  tasks  (the  “smooth  pursuit”  and  “saccadic”  systems)  was  investigated 
and  was  used  to  illustrate  the  effects  of  different  control  algorithms  for  coping  with 
delays. 

Fig.  13.  shows  five  of  the  control  systems.  These  controls  can  act  together 
(Fig.  14)  to  achieve  different  complex  visual  tasks  such  as  quick  target  acquisition 
and  then  tracking  (Fig.  15).  Extending  the  control  system  to  deal  with  delays 
requires  kinematic  simulation  <>f  the  head  and  dynamic  simulation  of  the  outside 
world  (Fig.  16). 
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Figure  13:  Fi \ <■  :•  :>r*^entative  head  and  camera  controls. 
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Figure  15:  Increasingly  effective  delay-free  control  results  from  superposition  of  non¬ 
interacting  controllers.  Left  and  right  pan  and  tilt  angular  errors  in  gaze  direction  (in 
radians)  are  plotted  against  time.  The  hollow  square  shows  left  camera  pan  error,  the 
butterfly  right  camera  pan  error,  and  the  dark  square  and  hourglass  show  left  and 
right  tilt  errors,  (a)  Track  ins:  lellex  only  (one  dominant  eye,  mechanical  stops  are 
hit);  (b)  adding  vergence  and  head  compensation  destabilizes  the  system;  (c)  adding 
vestibulo-ocular  (gaze  stabilization)  reflex  stabilizes  system  and  tracking  proceeds 
faultlessly. 
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Figure  16:  The  extended  control  algorithm  for  delayed  system. 
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|  CONTROL 

INPUT 

ALT.  INPUT 

OUT  I 

H  EYE 

Target  (x,y),(i,  if) 

ESE3MMI 

L.  Pan,  Tilt  vel. 

Track 

Target  (*,y) 

Target  (i,y) 

L.  Pan,  Tilt  vel. 

Gaze  Stabilize 

Head  Origin  Ax,  Ay,  X, 
Y,Z 

L.  Pan,  Tilt  vel. 

Vergence 

Horiz.  Disparity 

R.  Pan  vel. 

Virtual  Position 

target  (X,Y,Z) 

L.  Pan,  Tilt  vel. 

HEAD 

Compensate 

Eye  Pans,  Tilt 

(Ax,  Ay) 

fbst  Head  Rotate 

umi MM 

|  Virtual  Position 

Target  (X,Y,Z) 

Table  1:  Eye  and  head  control  summary.  The  ALT.  INPUT  column  shows  alternate 
forms  of  input,  (x,  y)  are  image  coordinates,  (X,  Y,  Z)  are  world  coordinates,  (Rx,  Ry ) 
are  head  rotation  angles.  A  design  issue  is  whether  feist  gaze  shifts  and  tracking  are 
performed  only  by  the  “dominant  eye”  camera  or  by  both  cameras.  Likewise  vergence 
can  affect  both  cameras  or  the  non-dominant  camera. 

4.7  Parallel  Cooperating  Agents  and  Juggler 

Our  first  robotics  application,  a  balloon  bouncing  program  called  Juggler,  successfully 
ran  in  November  1989  [Yamauchi  1989].  This  application  combines  binocular  camera 
input,  a  pipelined  image  processor,  and  a  6-degree-of-freedom  robot  arm  (with  a 
squash  racquet  attached)  to  bounce  a  balloon.  The  implementation  uses  a  competing 
agent  model  of  motor  control;  five  processes  compete  with  each  other  for  access  to  the 
robot  arm  to  position  the  balloon  in  the  visual  field,  to  position  the  racquet  under 
the  balloon,  and  to  hit  the  balloon. 

Each  application  process  is  allocated  a  physical  processor,  so  scheduling  is  not  a 
concern.  Juggler  is  robust  because  even  if  processes  had  to  share  processors,  failure 
to  execute  any  one  process  during  a  particular  time  interval  would  have  little  if  any 
affect  on  behavior;  in  the  competing  agent  model,  each  application  process  continually 
broadcasts  commands  to  the  robot  in  competition  with  other  processes. 

Juggler  was  a  first  attempt  to  integrate  our  operating  systems  efforts  with  the 
development  of  applications.  As  a  result  of  our  experiences  with  Juggler,  we  are 
making  appropriate  extensions  to  Psyche  and  communications  capabilities,  and  we 
have  begun  to  experiment  with  user-level  scheduling. 
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4.8  The  Workbench  for  Active  Vision  Experimentation 

The  Workbench  for  Active  Vision  Experimentation  (WAVE)  has  been  an  ongoing 
effort  since  the  summer  of  1988.  Its  purpose  is  to  provide  a  uniform  and  general 
purpose  platform  for  experimental  verification  of  our  research  [Brown  1988a, b;  Rimey 
1990]. 

WAVE  essentially  was  the  first  effort  to  “integrate  everything  in  the  Lab”.  The 
original  goals  were  to  build  a  system  which  causes  the  Puma  robot  to  visually  explore 
its  environment  for  racquet  balls  randomly  hanging  from  the  Lab  ceiling,  and  also 
to  produce  an  accompanying  repertory  of  simple  modular  behaviors  and  capabilities. 
In  this  system  the  Robot  first  moves  to  scan  the  entire  Lab  and  locates  each  ball 
using  binary  image  analysis  and  stereo  vision.  Next  the  Robot  moves  around  a  ball 
while  keeping  it  centered  in  the  field  of  view.  A  simple  animate  vision  technique  is 
demonstrated  by  computing  a  continuously  time  averaged  image.  The  accompanying 
Robot  movement  causes  the  background  areas  in  the  image  to  be  blurred  while  the 
object  remains  clear,  thus  demonstrating  a  simple  segmentation  technique.  Finally 
the  robot  pokes  each  ball  with  a  stick.  The  now  moving  ball  is  visually  tracked  using 
the  eye  motors  on  the  Head  (another  simple  animate  vision  idea).  The  overall  system 
is  more  fully  described  in  reports  by  Brown.  A  further  result  of  this  effort  was  a  guide 
for  other  members  of  our  group  on  “how  to  use  the  Rochester  robot”. 

Last  summer  the  WAVE  platform  was  put  to  further  use  in  a  study  of  the  problem 
of  moving  the  Head  to  view  the  front  of,  or  a  characteristic  view  of,  an  object.  The 
idea  which  we  developed  was  to  model  vision  with  a  parameter  net  model  to  model 
Head  movements  with  a  basic  PID  controller,  and  then  to  study  differential  rela¬ 
tionships  between  response  patterns  in  the  parameter  net  and  the  command  signals 
sent  to  the  PID  controller.  The  parameter  net  represented  an  object  using  a  Hough 
transform  of  its  silhouette.  Nearness  to  a  characteristic  viewpoint  was  related  with  a 
distortion  measure  over  nodes  in  the  parameter  net.  The  system  was  implemented, 
but  performed  poorly.  A  similar  effort  based  on  a  color  image  approach  (Wixson’s 
work,  described  above)  performed  slightly  better. 

Over  time  WAVE  has  evolved  into  a  more  general  platform.  In  anticipation  of  mov¬ 
ing  over  to  the  Psyche  operating  system  running  on  the  Butterfly  parallel  computer, 
WAVE  was  converted  to  the  g++  programming  language  used  by  Psyche  and  WAVE 
was  converted  to  use  the  Zebra  system  for  programming  our  DataCube  MaxVideo 
image  processing  hardware.  Zebra  currently  works  with  the  Psyche/ Butterfly  system 
as  well  as  the  original  Sun  machines.  WAVE  itself  has  not  yet  been  adapted  to  run 
under  Psyche. 

4.9  Modeling  attentional  behavior  sequences  with  an  aug¬ 
mented  hidden  Markov  model 

Selective  attention,  or  the  intelligent  application  of  limited  visual  resources,  has 
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emerged  as  a  basic  topic  for  a  long-range  program  of  research  we  are  now  pursuing. 
The  concept  here  is  that  realistically  any  system  has  to  deal  with  limited  sensing  and 
computational  resources,  and  that  therefore  we  should  focus  our  study  on  (selective 
attention)  mechanisms  to  deal  with  such  limited  resource  situations. 

One  approach  is  to  map  the  visual  attention  problem  onto  sensor  allocation  prob¬ 
lems  such  as  where  to  point  a  camera  and  where  to  allocate  processing  within  a  single 
image  from  that  camera.  If  we  assume  a  spatially- variant  sensor  (such  as  one  with  a 
small,  high-resolution  fovea  and  a  large,  low-resolution  periphery)  one  specific  prob¬ 
lem  is  to  decide  what  sequence  of  eye  movements  to  make  to  selectively  position  the 
fovea  in  the  scene.  One  aspect  of  the  work  attacks  the  specific  problem  of  modeling 
foveation  sequences.  In  most  treatments  of  this  subject,  a  sequence  of  eye  movements 
emerges  as  a  result  of  sequential  cognitive  effort  and  image  analysis,  and  is  not  ex¬ 
plicitly  represented.  We  decided  to  augment  the  usual  paradigm  with  a  new  explicit 
representation  of  probabilistic  but  task-dependent  attentions!  sequencing.  Explicit 
sequences  are  something  like  motor  skills ;  they  efficiently  capture  the  effect  of  much 
cognitive  activity  and  feedback-mediated  behavior ,  and  allow  it  to  be  generated  quickly 
with  low  cognitive  overhead. 

The  explicit  representation  is  an  augmented  hidden  Markov  model  (AHMM).  A 
simple  hidden  Markov  model  can  learn  an  emergent  behavior  and  re-generate  it  as 
an  explicit  data-oblivious  sequence.  An  AHMM  incorporates  a  feedback  sequence 
to  modify  the  generated  sequence.  It  can  therefore  relearn  or  constantly  modify  its 
own  (feedback  modified)  explicit  behavior,  thus  adapting  to  varying  conditions.  Two 
AHMM  models  have  been  developed,  the  first  model  uses  a  simple  external  feedback 
loop,  the  second  model  uses  internal  feedback  which  modifies  the  internal  parameters 
(probabilities)  of  the  AHMM  thus  effecting  the  generation  likelihoods  directly.  This 
work  has  been  experimentally  verified  using  the  capabilities  of  WAVE  and  the  results 
are  encouraging  [Rimey  and  Brown  1990]. 


5  Planning  in  a  Parallel  System 

We  have  been  exploring  ways  of  forming  and  executing  strategies  that  involve  se¬ 
quences  of  primitive  behaviors.  Actions  and  perception  are  the  only  realistic  way 
to  bring  computerized  decision-making  and  planning  into  contact  with  reality.  This 
“planning”  capability  is  necessary  for  systems  that  are  to  be  more  than  reflexive  [Feist 
1989a, b],  and  which  must  solve  problems  and  make  decisions  about  what  to  do  next 
(Allen  and  Pelavin  1986;  Allen  et  al.  1990].  Making  such  decisions  with  uncertain 
information  under  time  constraints  is  beyond  the  current  state  of  the  art,  although 
decision-making  under  uncertainty,  reasoning  about  actions  through  time  [Allen  1989: 
Allen  and  Hayes  1987],  and  in  general  the  questions  of  what  to  believe  and  what  to 
do  next  pervade  all  of  intelligent  behavior.  At  Rochester,  these  questions  are  being 
investigated  in  the  context  of  ARMTRAK  [Martin  et  al.  1990]  ,  a  micro-world  un- 


30 


der  development,  based  on  the  control  of  model  trains,  designed  to  integrate  work  in 
natural  language,  planning,  vision,  and  robotics. 

Two  versions  of  ARMTRAK  have  been  implemented:  a  simulation  and  a  set  of 
trains  coupled  to  the  sensors  associated  with  the  Rochester  Robot.  The  simulation 
allows  rapid  prototyping  of  planners  and  experimentation  with  problems  posed  by 
different  layouts.  Simulations  invariably  involve  simplifying  assumptions,  however,  so 
the  real  trains  and  sensors  in  the  vision  lab  allow  us  the  rare  opportunity  of  running 
a  symbolic  planner  in  the  real  world.  The  train  controller  has  been  wired  so  that  the 
switchyard  can  be  operated  from  outside  the  robot  room.  The  vision  routines  are 
able  to  recognize  the  existence  of  a  moving  train  in  its  field  of  view  and  are  able  to 
determine  the  state  of  a  switch  in  its  field  of  view.  The  robot  also  knows  the  locations 
of  the  switches,  so  it  can  position  itself  to  observe  them.  Despite  its  potential,  the 
ARMTRAK  implementation  is  currently  a  demonstration  of  concept  only.  It  does 
not  have  a  smooth  interface  between  the  LISP  world,  where  all  the  work  on  planning 
takes  place,  and  the  C  environment,  where  the  vision  work  is  implemented.  Our  goal 
is  to  support  LISP  on  our  multiprocessor,  and  to  have  shared  data  structures  linking 
the  symbolic  reasoner  and  the  perception  and  action  components  of  the  system,  which 
themselves  will  rely  on  the  integrated  soft  and  hard  real-time  subsystems  mentioned 
above. 

For  ARMTRAK  and  other  similar  systems  of  the  future,  we  would  like  to  pro¬ 
vide  a  solid  substrate  of  visuo-motor  behaviors  and  primitive  capabilities,  based  on 
well-understood  real-time  technology.  The  user  of  these  capabilities  should  not  have 
to  think  about  the  details  of  their  operation.  Likewise,  primitives  for  cooperation, 
preemption,  and  parallel  operation  of  these  low-level  capabilities  should  be  provided: 
a  smooth  integration  of  hard  and  soft  real-time  systems  is  an  important  aspect  of  this 
work. 

In  addition  to  our  ARMTRAK  work,  our  studies  of  learning  algorithms  have 
revealed  ways  of  learning  correct  primitive  sequences  by  trial  and  error  or  training 
[Whitehead  and  Ballard  1990;  Rimey  and  Brown  1990].  This  work  suggests  ways 
that  systems  can  learn  to  adapt  behaviors  in  complex  environments  and  lays  the 
groundwork  for  building  systems  that  satisfice. 


6  Parallel  Operating  Systems  and  the  Psyche 
Project 

The  centerpiece  of  the  CF.H  hardware  grant  (on  which  much  of  the  RADC  research 
was  based)  was  the  purchase*  in  19X5  of  a  128-node  BBN  Butterfly  Parallel  Processor. 
Over  the  course  of  the  contract  this  machine  was  used  to  support  research  in  paral¬ 
lel  programming  systems,  computer  vision,  massively  parallel  connectionist  models, 
and  the  theory  of  parallel  computation.  CER  allowed  us  to  acquire  and  experiment 
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with  several  generations  of  the  Butterfly  Parallel  Processor  from  BBN-ACI.  In  par¬ 
ticular,  a  later-generation  Butterfly  was  obtained  for  operating  systems  research  and 
applications.  Psyche  is  now  the  major  activity  surrounding  the  Butterfly.  Activity 
in  the  Psyche  group  involves  directly  or  indirectly  two  faculty  members  and  four  to 
six  graduate  students.  Psyche  was  running  its  first  jobs  just  when  the  CER  support 
terminated,  and  since  then  it  has  been  expanding  in  usefulness  to  the  user  community. 

One  goal  was  to  create  a  programming  environment  for  MIMD  (Multiple  instruc¬ 
tion  stream,  multiple  data  stream)  style  computers.  This  architecture  is  complemen¬ 
tary  to  other  styles  of  parallel  computing  such  as  SIMD  (in  which  identical  computa¬ 
tions  are  performed  in  parallel  to  different  data)  and  neural  nets.  CER  allowed  us  to 
acquire  and  experiment  with  several  generations  of  the  Butterfly  Parallel  Processor 
from  BBN-ACI. 

The  problem  with  MIMD  computation,  which  admits  multiple  independent  co¬ 
operating  large  processes  and  processors  to  run  concurrently,  is  that  the  interactions 
between  programs  (for  instance  their  data  accessing)  are  extremely  hard  to  moni¬ 
tor  and  even  to  repeat,  given  the  potential  for  race  conditions  and  the  scheduling 
differences  that  can  take  place  from  run  to  run.  Further,  there  are  several  compet¬ 
ing,  individually  adequate  models  of  parallel  programs  at  this  level.  For  instance, 
message- passing  models  and  shared-memory  models  offer  rather  different  user  views 
of  the  computational  resource.  Although  hardware  was  being  built  (like  the  BBN  But¬ 
terfly  Parallel  Processor)  to  support,  different  models  of  parallel  computation,  there 
was  a  serious  lack  in  the  current  state  of  the  art  of  an  operating  system  to  support 
several  such  models  at  once. 

To  improve  the  state  of  the  art  in  programming,  conceptualizing,  monitoring  per¬ 
formance,  and  optimizing  efficiency  in  MIMD  computation,  we  developed  systems 
like  PSYCHE  (an  operating  system)  and  MOVIOLA  (a  kit  of  performance  monitor¬ 
ing  and  debugging  tools.)  Altogether  we  also  produced  and  exported  about  a  dozen 
other  less  ambitious  systems  and  libraries.  The  interaction  of  the  MOVIOLA  de¬ 
bugging  and  performance  monitoring  tools  have  had  unexpected  efficacy  not  just  in 
debugging  but  in  algorithm  development. 

6.1  Early  Work 

At  the  time  our  Butterfly  was  purchased  it  was  not  yet  clear  whether  shared  memory 
would  be  practical  n  large-scale  multiprocessors.  Previous  architectures  had  been 
limited  in  size;  our  Butterfly  and  its  twin  at  BBN  were  for  several  years  the  largest 
shared-memory  multiprocessors  in  the  world,  by  a  large  margin.  Potential  problems 
with  memory  and  interconnect  contention,  the  management  of  highly-parallel  shared 
data  structures,  and  the  need  to  maximize  locality  of  reference  made  our  purchase  a 
risky  venture.  One  of  the  most  important  results  of  our  research  was  to  show  that 
none  of  these  problems  is  insurmountable.  We  used  the  Butterfly  to  obtain  significant 
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speedups  (often  nearly  linear)  on  over  100  processors  with  a  range  of  applications  that 
includes  various  aspects  of  computer  vision  [Brown  et  al.  1986;  Brown  1988b;  Olson 
et  al.  1987;  Olson  1986b, c],  connectionist  network  simulation  [Feldman  et  al.  1988b], 
numerical  algorithms  [LeBlanc  1987,  1988a],  computational  geometry  [Bukys  1986], 
graph  theory  [Costanzo  et  al.  1986],  combinatorial  search  [LeBlanc  et  al.  1988;  Scott 
1989],  lexical  and  syntactic  analysis  [Gafter  1987,  1988],  and  parallel  data  structure 
management  [Mellor-Crummey  1987], 

We  also  demonstrated,  through  our  research  in  parallel  programming  environ¬ 
ments  and  tools,  that  shared-memory  machines  are  flexible  enough  to  support  effi¬ 
cient  implementations  of  a  wide  range  of  programming  models,  with  both  coarse  and 
fine-grain  parallelism. 

From  1984  to  1987,  our  systems  work  is  best  characterized  as  a  period  of  experi¬ 
mentation,  designed  to  evaluate  the  potential  of  large  NUMA  (non-uniform  memory 
access)  multiprocessors  and  to  assess  the  need  for  software  tools.  In  the  course  of 
this  experimentation  we  ported  three  compilers  to  the  Butterfly  [Scott  1989;  Olson 
1986a;  Crowl  1988b],  developed  five  major  and  several  minor  library  packages  [Crowl 
1988b;  Low  1986;  LeBlanc  1988b;  LeBlanc  and  Jain  1987;  Scott  and  Jones  1988; 
Olson  1986;  LeBlanc  and  Mellor-Crummey  1986;  Fowler  et  al.  1989],  and  built  a 
parallel  file  system  [Dibble  and  Scott  1989a, b;  Dibble  et  al.  1988]  and  two  different 
operating  systems  [LeBlanc  et  al  1989b;  Cox  and  Fowler  1989].  Our  work  with  the 
Lynx  distributed  programming  language  [Scott  1987]  yielded  important  information 
on  the  inherent  costs  of  message  passing  [Scott  and  Cox  1987]  and  the  semantics 
of  the  parallel  language/operating  system  interface  [Scott  1986].  Experience  with 
a  C++  communication  library  yielded  similar  insights  for  object-oriented  systems 
[Crowl  1988b]. 

A  major  focus  of  our  experimentation  with  the  Butterfly  was  the  evaluation  and 
comparison  of  multiple  models  of  parallel  computing  [Brown  et  al.  1986;  LeBlanc 
et  al.  1988;  LeBlanc  1986,  1988a].  BBN  had  already  developed  a  model  based  on 
fine-grain  memory  sharing  [LeBlanc  1986].  In  addition,  among  the  programming 
environments  listed  above,  we  have  implemented  remote  procedure  calls  [Low  1986]; 
an  object-oriented  encapsulation  of  processes,  memory  blocks,  and  messages  [Crowl 
1988b];  a  message-based  library  package  [LeBlanc  1988b];  a  shared-memory  model 
with  numerous  lightweight  processes  [Scott  and  Jones  1988];  and  a  message-based 
programming  language  (Scott  1989].  In  an  intensive  benchmark  study  conducted  in 
1986  [Brown  et  al.  1986],  we  implemented  seven  different  computer  vision  applications 
on  the  Butterfly  over  the  course  of  a  three- week  period.  Based  on  the  characteristics  of 
the  problems,  programmers  chose  to  use  four  different  programming  models,  provided 
by  four  of  our  systems  packages.  For  one  of  the  applications,  none  of  the  existing 
packages  provided  a  reasonable  fit,  and  the  awkwardness  of  the  resulting  code  was  a 
major  impetus  for  the  development  of  yet  another  package  [Scott  and  Jones  1988]. 

Our  principal  conclusion  from  this  experimentation  was  that  while  every  pro- 


33 


gramming  model  has  applications  for  which  it  seems  appropriate,  no  single  model 
is  appropriate  for  every  application.  Just  as  a  general-purpose  uniprocessor  system 
must  permit  programs  to  be  written  in  a  wide  variety  of  languages  (encompassing 
a  wide  variety  of  models  of  sequential  computation),  we  formed  the  belief  that  a 
general-purpose  multiprocessor  system  must  permit  programs  to  be  written  under  a 
wide  variety  of  parallel  programming  models.  This  conviction  motivated  development 
of  the  Psyche  operating  system. 

6.2  Psyche  Motivation 

As  outlined  above,  our  early  work  led  to  several  conclusions. 

1)  Large-scale  shared-memory  multiprocessors  are  practical.  We  achieved  signif¬ 
icant  speedups  (often  almost  linear)  using  over  100  processors  on  a  wide  range  of 
applications  with  many  different  operating  systems,  library  packages,  and  languages. 
Shared-memory  multiprocessors  appear  to  be  able  to  support  coarse-grain  parallelism 
just  as  efficiently  as  message-based  multicomputers,  while  simultaneously  support¬ 
ing  very  fine-grain  interactions.  They  provide  an  extremely  flexible  foundation  for 
general-purpose  parallel  computing,  and  for  high-level  vision  in  particular. 

2)  Programmers  need  multiple  models  of  parallel  computation.  Though  many  styles 
of  communication  and  process  structure  can  be  implemented  efficiently  on  a  shared 
memory  machine,  no  single  model  can  provide  optimal  performance  for  all  applica¬ 
tions.  Moreover,  subjective  experience  indicates  that  conceptual  clarity  and  ease  of 
programming  are  maximized  by  different  models  for  different  kinds  of  applications. 
In  the  course  of  our  benchmark  experiments  [Brown  et  al.  1986],  seven  different  prob¬ 
lems  were  implemented  using  four  different  programming  models.  One  of  the  basic 
conclusions  of  the  study  was  that  none  of  the  models  then  available  was  appropriate 
for  certain  graph  problems;  this  experience  led  to  the  development  of  the  Ant  Farm 
library  package  [Scott  and  Jones  1988].  Large  embedded  applications  (such  as  vision) 
may  well  require  different  programming  models  for  different  components;  it  therefore 
seemed  important  to  be  able  to  communicate  across  programming  models  as  well. 

3)  An  efficient  implementation  of  a  shared  name  space  is  valuable  even  in  the 
absence  of  uniform  access  tune.  We  found  one  of  the  primary  advantages  of  shared 
memory  to  be  its  familiar  computational  model.  A  uniform  addressing  environment 
allows  programs  to  pass  pointers  and  data  structures  containing  pointers  without 
explicit  translation.  This  uniformity  of  naming  appears  to  be  the  primary  reason 
why  programmers  choose  to  use  BBN’s  Uniform  System  package.  Even  when  non- 
uniform  access  times  force  tin-  programmer  to  deal  explicitly  with  local  cacheing, 
shared  memory  continues  to  provide  a  form  of  global  name  space  that  supports  easy 
copying  of  data  from  one  U»  .it;>>n  to  another. 

4)  Dynamic  fine-grain  *'<..>  r -i</  ;s  important  for  many  applications.  It  is  often 
difficult  to  specify  at  creation  Mine  which  data  objects  will  be  shared  and  which 
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private,  which  local  and  which  global,  which  long-lived  and  which  temporary.  It  can 
be  particularly  difficult  to  specify  which  processes  will  need  access  to  specific  pieces 
of  data,  and  wasteful  to  require  processes  to  demonstrate  access  rights  for  data  they 
may  never  use.  Far  preferable  is  a  scheme  in  which  all  objects  are  potentially  sharable 
and  treated  uniformly,  with  access  control  and  other  bookkeeping  performed  as  late 
as  possible.  Such  a  scheme  provides  the  user  with  greater  latitude  in  program  design, 
minimizes  resource  usage,  and  facilitates  migration  to  maximize  locality  and  balance 
workloads. 

5)  Maximum  performance  and  flexibility  depend  on  a  low-level  kernel  interface. 
From  the  point  of  view  of  an  individual  application,  the  ideal  operating  system  prob¬ 
ably  lies  at  one  of  two  extremes:  it  either  provides  every  facility  the  application  needs, 
or  else  provides  a  flexible  and  efficient  set  of  primitives  from  which  those  facilities  can 
be  built.  A  kernel  that  lies  in  between  is  likely  to  be  both  awkward  and  slow:  awkward 
because  it  has  sacrificed  the  flexibility  of  the  more  primitive  system,  slow  because  it 
has  sacrificed  its  simplicity.  Moreover  a  kernel  with  a  high-level  interface  is  unlikely  to 
be  able  to  provide  facilities  acceptable  to  every  application.  Low-level  primitives  can 
be  much  more  universal.  They  imply  the  need  for  friendly  software  packages  that  run 
on  top  of  the  kernel  and  under  user  programs,  but  with  a  carefully-designed  interface 
these  can  be  as  efficient  as  kernel-level  code  and  much  less  difficult  to  change. 

6)  A  high-quality  programming  environment  is  essential.  Some  application  pro¬ 
grammers  in  our  department  who  could  have  exploited  the  parallelism  offered  by 
the  Butterfly  continued  to  use  Sun  workstations  and  VAXen.  These  programmers 
weighed  the  potential  speedup  of  the  Butterfly  against  the  programming  environ¬ 
ment  of  their  workstation  and  found  the  Butterfly  wanting.  Of  particular  importance 
are  tools  for  parallel  debugging.  Our  work  with  Instant  Replay  [Le31anc  and  Mellor- 
Crummey  1987;  Fowler  et  al.  1988]  clearly  provided  an  important  step  in  the  right 
direction.  A  high-quality  environment  for  performance  monitoring,  called  Moviola, 
was  also  created. 


6.3  Psyche 

Preliminary  ideas  for  Psyche  date  to  1986.  Design  work  began  in  earnest  in  1987, 
and  was  essentially  completed  by  the  summer  of  1988,  when  implementation  began 
on  the  BBN  Butterfly  Plus  multiprocessor.  Early  plans  for  Psyche  were  summarized 
in  a  1987  technical  report  [Scott  and  LeBlanc  1987].  Rationale  for  the  design  was 
presented  in  1988  [Scott  and  Marsh  1988].  Technical  reports  on  the  user/kernel 
interface  [Scott  et  al.  1989a]  and  the  memory  management  system  [LeBlanc  et  al. 
1989a]  appeared  in  19S9,  and  were  followed  by  workshop  papers  on  open-systems 
design  and  the  kernel  implementation  [Scott  et  al.  19S9b,c] .  A  detailed  discussion  of 
multi-model  programming  appeared  at  the  1989  PPoPP  conference. 

The  design  of  Psyche  is  based  on  the  observation  that  access  to  shared  memory 
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is  the  fundamental  mechanism  for  interaction  between  threads  of  control  on  a  multi¬ 
processor.  Any  other  abstraction  that  can  be  provided  on  the  machine  must  be  built 
from  this  basic  mechanism.  An  operating  system  whose  kernel  interface  is  based  on 
direct  use  of  shared  memory  will  thus  in  some  sense  be  universal. 

The  realm  is  the  central  abstraction  provided  by  the  Psyche  kernel.  Each  realm 
includes  data  and  code.  The  code  constitutes  a  protocol  for  manipulating  the  data  and 
for  scheduling  threads  of  control.  The  intent  is  that  the  data  should  not  be  accessed 
except  by  obeying  the  protocol.  In  effect,  a  realm  is  an  abstract  data  object.  Its 
protocol  consists  of  operations  on  the  data  that  define  the  nature  of  the  abstraction. 
Invocation  of  these  operations  is  the  principal  mechanism  for  communication  between 
parallel  threads  of  control. 

The  thread  is  the  abstraction  for  control  flow  and  scheduling.  All  threads  that 
begin  execution  in  the  same  realm  reside  in  a  single  protection  domain.  That 
domain  enjoys  access  to  the  original  realm  and  any  other  realms  for  which  access 
rights  have  been  demonstrated  to  the  kernel.  Part  of  the  layout  of  a  thread  context 
block  is  defined  by  the  kernel,  but  threads  themselves  are  created  and  scheduled  by 
the  user.  The  kernel  time-slices  on  each  processor  between  protection  domains  in 
which  threads  are  active,  providing  upcalls  at  quantum  boundaries  and  whenever 
else  a  scheduling  decision  is  required.  Context  switches  between  threads  in  the  same 
protection  domain  do  not  require  kernel  intervention.  In  addition,  a  standardized 
interface  to  scheduling  routines  allows  threads  of  different  types  to  block  and  unblock 
each  other. 

The  relationship  between  realms  and  threads  is  somewhat  unusual:  the  conven¬ 
tional  notion  of  an  anthropomorphic  process  has  no  analog  in  Psyche.  Realms  are 
passive  objects,  but  their  code  controls  all  execution.  Threads  merely  animate  the 
code;  they  have  no  “volition”  of  their  own. 

Depending  on  the  degree  of  protection  desired,  an  invocation  of  a  realm  operation 
can  be  as  fast  as  an  ordinary  procedure  call  or  as  slow  as  a  heavyweight  process 
switch.  We  call  the  inexpensive  version  an  optimized  invocation;  the  safer  version  is 
a  protected  invocation.  In  the  case  of  a  trivial  protocol  or  truly  minimal  protection, 
Psyche  also  permits  direct  external  access  to  the  data  of  a  realm.  One  can  think  of 
direct  access  as  a  mechanism  for  in-line  expansion  of  realm  operations.  By  mixing 
the  use  of  protected,  optimized,  and  in-line  invocations,  the  programmer  can  obtain 
(and  pay  for)  as  much  or  as  little  protection  as  desired. 

Keys  and  access  lists  are  the  mechanisms  used  to  implement  protection.  Each 
realm  includes  an  access  list  consisting  of  <key,  right>  pairs.  Each  thread  maintains 
a  list  of  keys.  The  right  to  invoke  an  operation  of  a  realm  is  conferred  by  possession 
of  a  key  for  which  appropriate  permissions  appear  in  the  realm’s  access  list.  A  key 
is  a  large  uninterpreted  value  alfording  probabilistic  protection.  The  creation  and 
distribution  of  keys  and  the  management  of  access  lists  are  all  under  user  control, 
enabling  the  implementation  of  many  different  protection  policies. 
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If  optimized  (particularly  in-line)  invocations  are  to  proceed  quickly,  they  must 
avoid  modification  of  memory  maps.  Every  realm  visible  to  a  given  thread  must 
therefore  occupy  a  different  location  from  the  point  of  view  of  that  thread.  In  addition, 
if  pointers  are  to  be  stored  in  realms,  then  every  realm  visible  to  multiple  threads 
must  occupy  the  same  location  from  the  point  of  view  of  each  of  those  threads.  In 
order  to  satisfy  these  two  requirements,  Psyche  arranges  for  all  coexistent  sharabie 
realms  to  occupy  disjoint  locations  in  a  single,  global,  virtual  address  space.  Each 
protection  domain  may  have  a  different  view  of  this  address  space,  in  the  sense  that 
different  subsets  may  be  marked  accessible,  but  the  virtual  to  physical  mapping  does 
not  change. 

The  view  of  a  protection  domain  is  embodied  in  the  hardware  memory  map. 
Execution  proceeds  unimpeded  until  an  attempt  is  made  to  access  something  not 
included  in  the  view.  The  resulting  protection  fault  is  fielded  by  the  kernel,  whose 
job  it  is  to  either  (1)  announce  an  error,  (2)  update  the  current  view  and  restart  the 
faulting  instruction,  or  (3)  perform  an  upcall  into  the  protection  domain  associated 
with  the  target  realm,  in  order  to  create  a  new  thread  to  perform  the  attempted 
operation.  In  effect,  Psyche  uses  conventional  memory-management  hardware  as  a 
cache  for  software-managed  protection.  Case  (2)  corresponds  to  optimized  invocation. 
Future  invocations  of  the  same  realm  from  the  same  protection  domain  will  proceed 
without  kernel  intervention.  Case  (3)  corresponds  to  protected  invocation.  The  choice 
between  cases  is  made  by  matching  the  key  list  of  the  current  thread  against  the  access 
list  of  the  target  realm. 

For  both  locality  and  communication,  the  philosophy  of  Psyche  is  to  provide  a 
fundamental,  low-level  mechanism  from  which  a  wide  variety  of  higher-level  facilities 
can  be  built.  Realms  with  appropriate  protocol  operations  can  be  used  to  implement 
the  following: 

1.  Pure  shared  memory  in  the  style  of  the  BBN  Uniform  System  [Thomas  1988]. 
A  single  large  collection  of  realms  would  be  shared  by  all  threads.  The  access 
protocol,  in  an  abstract  sense,  would  permit  unrestricted  reads  and  writes  of 
individual  memory  ceils. 

2.  Packet-switched  message  passing.  Each  message  would  be  a  separate  realm. 
To  send  a  message  one  would  make  the  realm  accessible  to  the  receiver  and 
inaccessible  to  the  sender. 

3.  Circuit-switched  message  passing,  in  the  style  of  Accent  [Rashid  and  Robertson 
1981]  or  Lynx  [Scott  1987].  Each  communication  channel  would  be  realized  as 
a  realm  accessible  to  a  limited  number  of  threads,  and  would  contain  buffers 
manipulated  by  protocol  operations. 

4.  Synchronization  mechanisms  such  as  monitors,  locks,  and  path  expressions. 
Each  of  these  can  be  written  once  as  a  library  routine  that  is  instantiated  as  a 
realm  by  each  abstraction  that  needs  it. 
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5.  Parallel  data  structures.  Special-purpose  locking  could  be  implemented  in  a 
collection  of  realms  scattered  across  the  nodes  of  the  machine,  in  order  to  reduce 
contention.  The  entry  routines  of  the  data  structure  as  a  whole  might  be  fully 
parallel,  able  to  be  executed  without  synchronization  until  access  is  required  to 
particular  pieces  of  the  data. 

Psyche  provides  a  low-level  interface  with  uniform  naming  and  an  emphasis  on 
dynamic  fine-grained  sharing.  Through  its  use  of  data  abstraction,  lazy  evaluation  of 
protection,  and  parameterized  user-level  scheduling,  it  allows  programs  written  un¬ 
der  many  different  programming  models  to  coexist  and  interact.  The  conventions  of 
realm  protocols,  upcalls,  and  block  and  unblock  routines  provide  a  structure  for  com¬ 
munication  across  models  that  is,  to  the  best  of  our  knowledge,  unprecedented.  With 
appropriate  permissions,  user-level  code  can  exercise  full  control  over  the  physical 
resources  of  memory,  processors,  and  devices.  In  effect,  it  should  be  possible  un¬ 
der  Psyche  to  implement  almost  any  application  for  which  the  underlying  hardware 
is  appropriate.  This,  for  us,  constitutes  the  definition  of  “general-purpose  parallel 
computing.” 

Psyche  differs  from  existing  multiprocessor  operating  systems  in  several  funda¬ 
mental  ways. 

1.  It  employs  a  uniform  name  (address)  space  for  all  its  user  programs  without 
relying  on  compiler  support  for  protection. 

2.  It  evaluates  access  rights  lazily,  permitting  the  distribution  of  rights  without 
kernel  intervention. 

3.  It  places  the  management  of  threads,  and  in  fact  their  definition,  in  the  hands 
of  user-level  code. 

4.  It  minimizes  the  need  for  kernel  calls  in  general  by  relying  whenever  possible 
on  shared  user/kernel  data  structures  that  can  be  examined  asynchronously. 

5.  It  provides  the  user  with  an  explicit  tradeoff  between  protection  and  perfor¬ 
mance  by  facilitating  the  interchange  of  protected  and  optimized  invocations. 

The  kernel  provides  the  foundation  for  a  wide  variety  of  future  work  in  parallel 
systems  as  well  as  for  applications  (including  real-time  artificial  intelligence).  It  is 
conceived  as  a  lowest  common  denominator  for  a  multiprocessor  operating  system, 
providing  only  those  functions  necessary  to  access  physical  resources  and  implement 
protection  in  higher  layers.  The  three  fundamental  kernel  abstractions  are  the  seg¬ 
ment,  the  address  space,  and  the  thread  of  control.  All  three  are  protected  through 
capabilities.  Unusual  features  include  an  inter-address-space  communication  mecha¬ 
nism  based  on  explicit  transfer  of  control  between  threads  and  a  facility  for  reflecting 
memory  protection  violations  upwards  into  user-space  fault  handlers. 
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As  of  November  1989  we  were  able  to  run  our  first  real  user  applications.  Imple¬ 
mented  portions  of  Psyche  include 

•  Low-level  machine  support:  interrupt  handlers,  virtual  memory  (without  pag¬ 
ing),  full  support  for  inter-kernel  shared  memory,  synchronous  inter- kernel  com¬ 
munication  via  remote  interrupts,  support  for  atomic  hardware  operations,  re¬ 
mote  source-level  kernel  debugging,  and  loading  of  the  kernel  via  Ethernet. 

•  Core  support  for  the  Psyche  user  interface:  realms,  virtual  processors,  pro¬ 
tection  domains,  keys  and  access  lists,  software  interrupts,  and  protected  and 
optimized  invocation  of  realm  operations. 

•  Rudimentary  I/O  to  the  console  serial  device,  and  remote  file  service  via  Eth¬ 
ernet. 

•  Minimal  user-level  tools:  a  simple  shell,  program  loader,  and  name  server,  sup¬ 
port  for  command-line  argument  passing,  simple  handlers  for  software  inter¬ 
rupts,  and  standard  I/O  and  kernel  call  libraries. 

We  expect  our  work  on  Psyche  to  evolve  into  many  interrelated  projects.  We  are 
already  experimenting  with  novel  and  promising  approaches  to  memory  management, 
inter-node  communication  within  the  kernel  and  support  for  remote  debugging.  We 
are  working  to  develop  practical  techniques  to  maximize  locality  of  reference  through 
automatic  code  and  data  migration.  We  expect  our  future  efforts  to  include  work 
on  lightweight  process  structure,  implementation  and  evaluation  of  communication 
models,  and  parallel  language  design.  The  latter  subject  is  of  particular  interest.  We 
have  specifically  avoided  language  dependencies  in  the  design  of  the  Psyche  kernel.  It 
is  our  intent  that  many  languages,  with  widely  differing  process  and  communication 
models,  be  able  to  coexist  and  cooperate  on  a  Psyche  machine.  We  are  interested, 
however,  in  the  extent  to  which  the  Psyche  philosophy  itself  can  be  embodied  in  a 
programming  language. 

The  communications  facilities  of  a  language  enjoy  considerable  advantages  over  a 
simple  subroutine  library.  They  can  be  integrated  with  the  naming  and  type  structure 
of  the  language.  They  can  employ  alternative  syntax.  They  can  make  use  of  implicit 
context.  They  can  produce  language-level  exceptions.  For  us  the  question  is:  to  what 
extent  can  these  advantages  be  provided  without  insisting  on  a  single  communication 
model  at  language-design  time?  We  expect  these  questions  to  form  the  basis  of  future 
work. 

The  Psyche  design  was  motivated  and  continues  to  be  driven  by  the  needs  of 
application  programs,  primarily  AI  applications.  Our  experiences  in  the  development 
of  individual  vision  programs  -<n  the  Butterfly  provided  the  lessons  upon  which  the 
Psyche  design  was  based.  We  -m rrssfully  used  the  active  vision  and  robotics  project 
as  a  vehicle  for  evaluating  the  i\vche  design  and  implementation. 
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Our  laboratory  for  active  vision  and  robotics  assumes  a  hardware  configuration  in 
which  camera  output  is  fed  into  a  pipelined  image  processor  and  the  general-purpose 
multiprocessor  is  reserved  for  higher-level  planning  and  control.  Initially,  most  of 
these  higher-level  functions  were  performed  on  a  uniprocessor  Sun.  As  the  Psyche 
implementation  became  available,  some  of  these  functions  were  migrated  onto  the 
Buttterfly.  By  making  this  migration  an  explicit  part  of  the  development  process 
we  permitted  early  work  in  the  systems  and  application  domains  to  proceed  in  a 
semi- decoupled  fashion,  with  neither  on  the  other’s  critical  path.  The  success  of 
our  previous  efforts  in  operating  system  implementation  for  the  Butterfly  [Mellor- 
Crummey  et  al.  1987],  together  with  the  fact  that  Psyche  construction  is  now  well 
underway,  suggests  that  the  availability  of  the  operating  system  is  unlikely  to  be  a 
problem  in  later  phases  of  the  project. 

Research  in  this  direction  is  continuing,  with  further  hardware  support  provided 
by  an  NSF  IIP  grant.  Once  software  has  moved  to  the  Butterfly,  we  expect  our  higher- 
level  functions  to  involve  hundreds  of  parallel  threads  of  control.  Some  of  these  threads 
will  share  data  structures.  Others  will  interact  through  message  passing.  Some  will 
be  confine  their  activities  to  the  multiprocessor.  Others  will  interface  to  the  image 
processor  and  the  camera  and  robot  controls.  Those  that  share  data  are  likely  to 
differ  in  their  needs  for  synchronization  and  consistency. 


7  Programming  Environments  for  Pipelined  Par¬ 
allel  Vision:  Zebra  and  Zed 

Under  the  RADC  contract,  Rochester  developed  an  object  oriented  programming 
interface  to  Datacubes  MaxVideo  family  of  image  processing  boards.  The  system  is 
called  Zebra.  Zebra  is  not  simply  a  packaged  version  of  the  standard  Maxware  calls, 
but  rather  a  different  style  of  programming  for  the  Datacube  hardware. 

The  basic  philosophy  of  Zebra  is  two-fold.  First,  each  board  type  is  represented 
by  an  object  class.  Each  physical  MaxVideo  board  is  represented  by  an  instance  of 
its  class.  Simply  by  declaring  the  board  objects  as  variables,  the  boards  are  opened 
and  initialized.  Second,  Zebra  takes  a  microprogramming-like  approach  to  control¬ 
ling  Datacube  boards.  The  register  set  for  each  board  is  considered  to  be  a  micro¬ 
instruction  word.  This  instruction  word  completely  specifies  a  board  configuration. 
By  sending  instruction  words  to  boards,  the  hardware  can  be  completely  programmed 
in  a  microprogramming-like  manner. 

The  nature  of  applications  code  becomes  largely  different  from  that  of  Maxware 
counterparts.  The  configuration  of  MaxVideo  boards  is  not  represented  in  the  call  se¬ 
quence  of  the  application  program  but  rather  in  a  text  file  which  may  be  changed  with¬ 
out  recompiling  the  application  program.  Thus  the  development  process  is  stream¬ 
lined  by  requiring  fewer  compilations. 
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Instruction  words  can  be  stored  in  and  retrieved  from  files,  allowing  the  sharing 
of  standard  configurations  between  developers.  Instruction  words  are  created  an 
modified  via  an  instruction  word  editor.  One  such  editor  ’’Zed”  is  provided  with 
Zebra. 

Zed  allows  a  programmer  to  create  a  new  instruction  word  or  modify  an  existing 
one  directly  from  the  keyboard.  This  instruction  word  may  then  be  saved  in  a  file 
or  loaded  directly  into  a  physical  board  for  testing.  This  allows  rapid  prototyping  of 
board  configurations. 

Some  details  of  Zebra  are  the  following. 

•  It  is  object  oriented,  and  written  in  C++:  It  encapsulates  each  board  as  an  ob¬ 
ject,  created  and  initialized  upon  declaration,  that  can  be  updated  and  queried. 

•  It  leads  to  far  less  complicated  applications  code  than  Maxware. 

•  It  uses  explicit  human-  and  program-  read/writeable  board  descriptions,  which 
are  a  succinct  and  stable  way  to  store,  access,  re-use,  and  share  board  configu¬ 
rations. 

•  It  is  not  based  on  any  other  interface  software  (it  does  not  use  Maxware  or  the 
Datacube  device  driver,  for  instance). 

•  It  already  runs  on  two  dissimilar  architectures  at  UR  (the  BBN-ACI  Butterfly 
Parallel  Processor  and  Suns).  It  only  assumes  a  memory-map  operating  system 
call  and  so  is  highly  portable  between  host  architectures. 

Rochester  has  also  developed  Zed,  which  is  released  with  Zebra.  Zed  has  the 
following  characteristics. 

•  It  is  an  illustrative  Zebra  application. 

•  It  provides  an  interactive,  menu-based  interface  for  board  configuration,  editing, 
and  experimentation. 

•  It  runs  on  any  standard  terminal,  and  under  Suntools  and  X-windows. 

•  It  allows  new  users  to  begin  using  Datacube  hardware  in  minutes. 

The  following  example  Zebra  program  uses  the  P3  bus  to  implement  a  full-frame 
continuous  transfer  of  image  data  from  Digimax  to  a  ROI-Store  512,  back  to  Digimax, 
and  up  onto  a  monitor. 
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main() 

-C 


//  create  and  init  the  boards 

1  dgBoard  digimax(DG_00_BASE,  DG_00_IVEC,  "Zdglnit .zff ") ; 

2  rsBoard  rs0(RS_00_RBASE,  RS_00_MBASE,  RS.M512,  RS_00_IVEC, 

"ZrsCont512 . zff ") ; 

//  fire  the  transfer 

3  rs0.fire(RS_READ) ; 

4  rsO. f ire (RS .WRITE) ; 

} 

Line  one  declares  an  object  of  class  dgBoard  with  the  name  digimax.  This  opens 
the  board  specified  at  VME  address  ”DG_00_BASE”,  and  initializes  the  board  with 
the  configuration  in  file  ’’Zdglnit. zff”.  Line  two  similarly  declares  a  roistore  board 
object.  Lines  three  and  four  are  analogous  to  Maxware  rsRFire  and  rsWFire  respec¬ 
tively.  Note  that  to  change  this  program  to  do  a  singleshot  “snapshot”  transfer,  the 
configuration  file  can  be  changed  without  recompiling  the  program.  Alternatively  a 
different  configuration  file  can  be  used.  Zebra  and  Zed  are  available  free  of  charge  by 
anonymous  FTP  from  CS. Rochester. Edu. 


8  Other  Programming  Libraries  and  Utilities  for 
MIMD  Parallelism 

Several  low-level  communications  utilities  were  written  to  support  the  interaction  of 
parallel  image  processing  with  action.  Communication  between  the  embedded  con¬ 
troller  in  the  robot  arm  and  controlling  software  on  the  host  is  via  9600  baud  serial 
line.  On  top  of  the  serial  line  is  layered  a  reliable  data  link  protocol,  implemented 
under  Unix  as  a  tty  line  discipline  and  in  the  robot  controller  as  a  part  of  the  VAL 
execution  monitor.  Above  the  data  link  layer  is  a  protocol  supporting  multiple  logi¬ 
cal  channels  between  the  robot  and  the  host.  The  data  link  software  was  developed 
and  distributed  by  the  Electrical  Engineering  Department  at  Purdue  University.  The 
logical  channel  software  (BOTLIB)  was  inspired  by  an  analogous  interface  developed 
at  Purdue,  but  has  been  completely  re-engineered  at  Rochester  to  provide  more  flex¬ 
ibility  and  speed.  It  provides  routines  to  get  the  current  robot  location  in  terms  of 
standard  coordinates  or  joint  angles,  move  the  robot  to  a  specified  location  in  terms 
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of  standard  coordinates  or  joint  angles,  set  the  speed  of  the  robot,  and  to  set  the 
location  and  orientation  of  the  tool  tip.  The  software  is  organized  as  a  C  language 
library.  The  routines  described  above  can  be  called  from  the  application  program. 

An  alternate  C  library  (ROBOCOMM)  was  written  by  Brian  Yamauchi  for  use 
in  the  Juggler  project  (see  below).  ROBOCOMM  is  much  faster  than  the  BOTLIB 
package  since  it  does  not  use  the  multi-layered,  reliable  ISO-standard  structure  for 
communication. 

Work  in  these  areas  is  continuing  past  the  contract  period.  Connection  between 
the  Butterfly  serial  ports  and  the  robot  is  being  explored  by  Mark  Crovella,  who  is 
adding  Psyche  capabilities  to  manage  such  communications.  When  complete,  this 
facility  will  give  individual  Butterfly  processors  the  ability  to  communicate  directly 
with  the  robot. 

Under  the  RADC  contract,  Rochester  developed  several  compilers,  program  li¬ 
braries,  systems  utilities  for  communication,  and  file  systems.  The  results  at  the  end 
of  the  contract  period  span  a  broad  range  from  parallel  file  systems  through  new  lan¬ 
guages  for  expressing  parallel  computation.  Applications  packages  such  as  the  current 
version  of  the  neural  net  simulator  [Fanty  1986,  1988;  Goddard  et  ad.  1989]  and  the 
image-processing  utilities  produced  throughout  the  contract  period  allow  speedups 
of  up  to  a  factor  of  100  over  single-workstation  implementations  [Olson  et  ad  19S7, 
Olson  1986b, c].  User  interfaces  to  large  multiprocessor  computers  are  a  difficult  issue, 
but  we  have  contributed  to  that  as  well  [Scott  and  Yap  1988;  Yap  and  Scott  1990, 
Olson  1986a]  and  we  are  still  working  to  extend  the  range  of  computational  models 
available  to  a  user.  For  instance  the  Ant  Farm  project  provides  the  basic  capability 
to  support  many  lightweight  processes. 

“An  Empirical  Study  of  Message-Passing  Overhead,”  by  M.  L.  Scott  and  A.  L. 
Cox,  appeared  at  the  7th  International  Conference  on  Distributed  Computing  Sys¬ 
tems  in  Berlin,  West  Germany  in  September  1987.  It  reports  on  efforts  to  optimize  the 
performance  of  the  LYNX  run-time  support  package,  and  presents  a  detailed  break¬ 
down  of  costs  in  the  final  implementation.  This  breakdown  (1)  reveals  the  marginal 
cost  of  various  features  of  LYNX,  (2)  carries  important  implications  for  the  costs  of 
related  features  in  other  languages,  and  (3)  sets  an  example  for  similar  studies  in  other 
environments.  Other  work  in  this  important  effort  of  quantifying  parallel  behavior  is 
also  documented  in  [Floyd  1989;  LeBlanc  et  al.  1988;  LeBlanc  1988a,  19S8b;  Scott 
and  Cox  1987]. 

The  “Ant  Farm”  library  package  was  used  to  develop  applications  [Scott  and  Jones 
1989].  It  supports  extremely  large  numbers  (c.  25,000)  of  lightweight  processes  in 
Modula-2  with  location-transparent  communication. 

We  constructed  and  stu<  !:>•<!  the  performance  of  a  novel  operating  system  for  the 
Butterfly,  called  Elmwood.  1  ':nwood-An  Object-Oriented  Multiprocessor  Operating 
System”  appeared  in  Software  I’  i  act  ice  and  Experience  [Mellor-Crummey  et  al.  19S7; 
LeBlanc  et  al.  1989]. 
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“Crowd  Control:  Coordinating  Processes  in  Parallel”  by  T.J.  LeBlanc  and  S.  Jain 
appeared  in  the  Pr oc.  of  the  International  Conference  on  Parallel  Processing.  This 
paper  describes  a  library  package  for  the  Butterfly  that  can  be  used  to  create  a  parallel 
schedule  for  large  numbers  of  processes.  A  partial  order  is  imposed  on  the  execution 
based  on  an  arbitrary  embedding  of  processes  in  a  balanced  binary  tree  [LeBlanc  and 
Jain  1987]. 

Other  utilities  developed  over  the  contract  period  include  the  Bridge  file  system 
for  parallel  I/O,  by  Peter  Dibble  [Dibble  et  al.  1988;  Dibble  and  Scott  1989a, b],  the 
Platinum  and  Osmium  systems  for  automatically  resolving  cacheing  and  non-uniform 
reference  problems  in  SIMD-like  computations  [Fowler  and  Cox  1988a, b;  Cox  and 
Fowler  1989].  and  many  other  pieces  of  work  cited  in  the  references  [Olson  1986a, 
Mellor-Crummey  1987;  Gafter  1987,  1988;  Bolosky  1989]. 

Characteristics  of  several  programming  utilities  are  compared  in  Table  2,  which 
also  includes  some  well-known  programming  systems  for  NUMA  MIMD  computers 
such  as  the  Butterfly  available  commercially  (Uniform  System,  Emerald,  Linda).  This 
extensive  experience  in  implementing  and  analyzing  the  performance  of  parallel  pro¬ 
gramming  models  has  naturally  led  to  the  ideas  behind  the  Psyche  system  [Scott  and 
LeBlanc  1987;  Scott  et  al.  1988,  1989a, b,c,  1990]. 


9  Programming  Environments  for  MIMD  Paral¬ 
lelism 

A  major  portion  of  the  work  under  the  RADC  contract  concentrated  on  problems  of 
monitoring  and  debugging  programs  for  the  parallel  vision  environment.  Rochester 
developed  many  tools  to  help  the  user  effectively  implement  parallel  algorithms  [e.g. 
LeBlanc  1989;  LeBlanc  et  al.  1990;  Mellor-Crummey  1988,  1989].  The  main  thrust 
has  been  the  construction  of  parallel  performance  monitoring  tools  and  experimenta¬ 
tion  with  the  use  of  these  tools  [e.g.  Fowler  and  Bella  1989;  Fowler  et  til.  1989]. 

One  of  the  most  serious  problems  in  the  development  cycle  of  large-scale  parallel 
programs  is  the  lack  of  tools  for  debugging  and  performance  analysis.  Three  issues 
complicate  parallel  program  analysis.  First,  parallel  programs  can  exhibit  nonrepeat- 
able  behavior,  limiting  the  effectiveness  of  traditional  cyclic  debugging  techniques. 
Second,  interactive  analysis,  frequently  employed  for  sequential  programs,  can  distort 
a  parallel  program’s  execution  behavior  beyond  recognition.  Third,  comprehensive 
analysis  of  a  parallel  program’s  execution  requires  collection,  management,  and  pre¬ 
sentation  of  an  enormous  amount  of  data.  Our  work  addressed  all  of  these  problems. 

Our  work  has  been  different  from  other  research  in  parallel  program  analysis  in 
two  key  respects.  First,  our  focus  was  on  large-scale,  shared-memory  multiproces¬ 
sors.  Second,  our  approach  integrated  debugging  and  performance  analysis,  using  a 
common  representation  of  program  executions. 
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Package 

processes 

scheduling 

communication 

synchronization 

protection 

Uniform  System 

procedure  weight 

concurrent;  nm 

to  completion 

shared  memory 

spin  locks, 
atomic  queues 

none 

Lynx 

one  per  address 
space;  multi¬ 
threaded 

processes  con¬ 
current;  threads 

run  until  blocked 

RPC 

implicit  in 
communication 

between 

processes 

SMP 

one  per  address 
tpaoe 

concurrent, 

pteemptable 

non-Mocking 

messages 

implicit  m 

between 

processes 

Chrysalis -M- 

one  per  address 
space 

concurrent, 

preemptsble 

shared  memory , 

events, 

atomic  queues 

between 

processes 

Ant  Farm 

coroutine  weight, 
statically  located 

run  until  blocked 
within  a  processor 

shared  memory 

events,  monitors, 
queues,  semaphores 

none 

MoltOLisp 

coroutine  weight 

concurrent, 

preemptsble 

shared  memory 

monitors;  implicit 
in  expression 
evaluation 

*  none 

Platinum 

multiple  per 
address  space; 
kernel  managed 

concurrent, 

preemptible 

shared  memory. 

spin  locks; 

WWJlKwt  fa 

between 

address 

spaces 

Elmwood 

multiple  per 
address  space; 
kernel  managed 

l 

concurrent,  pre¬ 
emptible;  move 
between  objects 

object  invocation; 
shared  memory 
within  objects 

implicit  in  invoca¬ 
tion;  semaphores 
and  conditions 

within  objects 

between 

address  spaces 
(objects) 

Emerald 

coroutine  weight 

concurrent,  pre¬ 
emptible;  move 
between  objects 

object  invocation: 
shared  memory 
within  objects 

implicit  m  invoca¬ 
tion;  monitor* 
within  objects 

between  objects 
(compiler 
enforced) 

Linda 

unspecified 

concurrent, 

preemptible 

diared  assoc¬ 
iative  store 

implicit  in 

store  accesses 

unspecified; 
often  provided 

Table  2:  Programming  systems  (six  developed  at  Rochester)  for  NUMA  MIMD  com¬ 
puters. 
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The  core  of  our  toolkit  consists  of  facilities  for  recording  execution  histories,  a 
common  user  interface  for  the  interactive,  graphical  manipulation  of  those  histories, 
and  tools  for  examining  and  manipulating  program  state  during  replay  of  a  previously 
recorded  execution.  These  facilities  form  a  foundation  upon  which  we  can  construct 
more  complex  tools  such  as  symbolic  debuggers,  execution  profilers,  and  performance 
analyzers. 

We  have  constructed  a  set  of  tools  for  instrumenting  parallel  programs  on  the 
Butterfly  for  performance  analysis.  Each  process  in  an  instrumented  program  records 
on  its  own  “history  tape”  each  of  its  interactions  with  shared  objects  including  the 
relative  timing  of  the  operations. 

An  execution  history  is  represented  naturally  as  a  directed  acyclic  graph  (DAG) 
of  process  interactions.  Nodes  in  the  graph  correspond  to  monitored  events  that 
took  place  during  execution.  Each  event  represents  an  operation  on  a  shared  object. 
Events  within  a  process  are  linked  by  arcs  denoting  a  temporal  relation  based  on 
a  local  time  scale.  Arcs  between  events  in  different  processes  denote  interprocess 
communication  and  synchronization. 

The  collection  of  history  tapes  from  the  individual  processes  can  be  combined  to 
give  a  consistent  view  of  the  execution  of  the  program  as  a  whole.  This  view  contains 
information  useful  for  identifying  critical  paths,  bottlenecks,  and  hot  spots  in  the 
program. 

An  execution  of  a  parallel  program  instrumented  for  performance  monitoring  gen¬ 
erates  a  massive  amount  of  data.  This  data  is  incomprehensible  in  its  raw  form 
so  we  developed  am  interactive  graphical  display  and  analysis  program  called  Movi¬ 
ola.  Moviola  features  a  flexible  user  interface  (graphics  and  LISP)  and  analytic  tools 
(critical  path  analysis). 

The  “streams”  package  part  of  the  NFS  (Network  File  System)  interfame  to  the 
Butterfly  was  implemented.  Mellor-Crummey  produced  an  integrated  instrumenta¬ 
tion  package  that  extends  Instant  Replay  with  the  performance  monitoring  package. 
This  uses  the  streams  package  for  asynchronous  transfer  of  “history  data.” 

Using  Moviola  and  the  instrumentation  package,  we  experimented  with  their  use 
in  the  debugging  and  performance  analysis  and  tuning.  Mellor-Crummey’s  thesis 
demonstrated  their  effects  in  the  development  of  parallel  sorting  programs  [Mellor- 
Crummey  1989]. 

9.1  Performance  Monitoring  and  Debugging 

Parallel  programming  requires  that  programmers  deal  with  new  and  unfamiliar  ab¬ 
stractions,  often  using  tools  designed  for  sequential  programs.  Debugging  is  compli¬ 
cated  by  parallelism,  and  traditional  cyclic  techniques  for  debugging  may  not  help, 
since  many  parallel  programs  have  non-repeatable  behavior.  Program  profilers  are 
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of  little  use  in  performance  tuning,  since  it  may  be  difficult  to  determine  the  impact 
of  an  individual  process  on  overall  performance,  the  effects  of  process  decomposi¬ 
tion,  and  the  outcome  of  specific  optimizations.  Tools  that  report  the  instantaneous 
level  of  parallelism  can  illustrate  how  well  the  program  is  performing,  but  provide  no 
guidance  on  how  to  improve  parallelism. 

For  the  past  four  years  we  have  been  developing  a  toolkit  for  debugging  and  per¬ 
formance  analysis  of  parallel  programs  on  large-scale  shared-memory  multiprocessors. 
Our  approach  is  to  use  program  replay  in  cyclic,  post-mortem  analysis  Cyclic  debug¬ 
ging  assumes  that  experiments  are  interactive  and  repeatable,  and  that  all  relevant 
program  behavior  is  observable.  Unlike  other  work,  such  as  Behavioral  Abstraction 
or  PIE  in  which  monitoring  software  filters  relevant  information  during  execution,  we 
save  enough  information  to  reproduce  an  execution  for  detailed  analysis  off-line.  A 
distinguishing  characteristic  of  our  work  is  the  integration  of  debugging  and  perfor¬ 
mance  analysis,  based  on  a  common  underlying  representation  of  program  executions. 

In  parallel  program  analysis,  the  focus  of  concern  is  no  longer  simply  the  internal 
state  of  a  single  process,  but  must  include  internal  states  of  (potentially)  many  dif¬ 
ferent  processes  and  the  interactions  among  processes.  A  cyclic  methodology  can  still 
be  used,  but  four  issues  that  complicate  analysis  must  first  be  addressed:  (1)  parallel 
programs  often  exhibit  nonrepeatable  behavior,  (2)  interactive  analysis  can  distort  a 
parallel  program’s  execution,  (3)  analysis  of  large-scale  parallel  programs  requires  the 
collection,  management,  and  presentation  of  an  enormous  amount  of  data,  and  (4) 
the  execution  environment,  which  must  admit  extensive  parallelism,  and  the  analysis 
environment,  which  must  provide  a  single,  comprehensive  user-interface,  may  differ 
dramatically.  Our  research  is  currently  devoted  to  addressing  each  of  these  issues. 


9.2  Monitoring  Parallel  Programs 

Monitoring  parallel  programs  for  cyclic  debugging  requires  that  essential  information 
be  extracted  during  execution  to  allow  for  reproducible  experiments.  Unfortunately, 
parallel  programs  may  exhibit  timing-dependent  behavior  due  to  race  conditions  in 
synchronization  or  programmer  intervention  during  debugging.  To  allow  cyclic  de¬ 
bugging  and  reproducible  behavior  during  debugging,  the  monitoring  system  must 
capture  both  program  state  information  and  relative  timing  information. 

Several  message- based  debuggers  have  been  developed  that  record  the  contents  of 
each  message  sent  in  the  system  in  an  event  log  The  programmer  can  either  review 
the  messages  in  the  log,  in  an  attempt  to  isolate  errors,  or  the  events  can  be  used  as 
input  to  replay  the  execution  of  a  process  in  isolation.  Experiments  with  executions 
can  be  reproduced  by  presenting  the  same  messages  to  each  process  in  the  proper 
sequence. 

Our  approach  to  monitoring  for  shared-memory  parallel  programs  is  based  on  a 
partial  order  of  accesses  to  shared  objects  In  this  approach,  all  interactions  between 
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processes  are  modeled  as  operations  on  shared  objects.  During  program  execution 
each  process  records  a  history  of  its  accesses  to  shared  objects,  collecting  a  trace 
of  all  synchronization  events  that  occur.  The  union  of  the  individual  process  histo¬ 
ries  specifies  a  partial  order  of  accesses  to  each  shared  object.  This  partial  order, 
together  with  the  source  code  and  input,  characterizes  an  execution  of  the  parallel 
program.  Since  an  execution  history  contains  only  synchronization  information,  it 
is  much  smaller  than  a  record  of  all  data  exchanged  between  processes,  making  it 
relatively  inexpensive  to  capture. 

In  addition  to  race  conditions,  other  nondeterministic  execution  properties,  such 
as  asynchronous  interrupts,  can  cause  nonreproducible  behavior.  We  have  developed  a 
software  instruction  counter  to  reproduce  these  events  during  program  replay  [Mellor- 
Crummey  and  LeBlanc  1989]. 

9.3  A  Toolkit  for  Parallel  Program  Analysis 

The  information  we  collect  during  program  monitoring  can  be  used  to  replay  a  pro¬ 
gram  during  the  debugging  cycle.  During  replay,  events  can  be  observed  at  any  level 
of  detail,  and  controlled  experiments  can  be  performed.  More  important,  however,  is 
that  we  use  program  monitoring  to  create  a  representation  for  an  execution  that  can 
be  analyzed  by  our  programmable  toolkit. 

The  core  of  our  toolkit  consists  of  facilities  for  recording  execution  histories,  a 
common  user  interface  for  the  interactive,  graphical  manipulation  of  those  histories, 
and  tools  for  examining  and  manipulating  program  state  during  replay  of  a  previously 
recorded  execution.  The  user  interface  for  the  toolkit  resides  on  the  programmer’s 
workstation  and  consists  of  two  major  components:  an  interactive,  graphical  browser 
for  analyzing  execution  histories,  and  a  programmable  Lisp  environment.  The  execu¬ 
tion  history  browser,  called"  Moviola,  is  written  in  C  and  runs  under  the  X  Windows 
System. 

Moviola  implements  a  graphical  view  of  an  execution  based  on  a  DAG  represen¬ 
tation  of  processes  and  communication.  Moviola  gathers  process-local  histories  and 
combines  them  into  a  single,  global  execution  history  in  which  each  edge  represents  a 
temporal  relation  between  two  events.  In  a  Moviola  diagram,  time  flows  from  top  to 
bottom.  Events  that  occur  within  a  process  are  aligned  vertically,  forming  a  time-line 
for  that  process.  Edges  joining  events  in  different  processes  reflect  temporal  rela¬ 
tionships  resulting  from  synchronization.  Event  placement  is  determined  by  global 
logical  time  computed  from  the  partial  order  of  events  collected  during  execution. 
Each  event  is  displayed  as  a  -haded  box  with  height  proportional  to  the  duration  of 
the  event  (e.g.  Fig.  17). 

Moviola’s  user  interface  ;■!■■■.  ides  a  rich  set  of  operations  to  control  the  graphical 
display.  Several  interactive-  n  .  •  danisms,  including  independent  scaling  in  two  dimen¬ 
sions,  zoom,  and  smooth  pan:..:..:.  allow  the  programmer  to  concentrate  on  interesting 
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portions  of  the  graph.  Individual  events  can  be  selected  for  analysis  using  the  mouse; 
the  user  has  control  over  the  amount  and  type  of  data  displayed  for  selected  events. 
The  user  can  also  control  which  processes  are  displayed  and  how  they  are  displayed. 
By  choosing  to  display  dependencies  for  a  subset  of  the  shared  objects,  screen  clutter 
can  be  reduced. 

Many  different  analyses  are  possible  based  on  this  graphical  view  of  an  execution, 
but  the  sheer  size  of  an  execution  history  graph  makes  it  impractical  to  base  all 
analyses  on  manual  manipulation  of  the  graph.  Extensibility  and  programmability 
are  provided  by  running  all  tools  under  the  aegis  of  Common  Lisp.  Tools  can  take 
the  form  of  interpreted  Lisp,  compiled  Lisp,  or,  like  Moviola,  foreign  code  loaded 
into  the  Lisp  environment.  Our  programmable  interface  enables  a  user  to  write  Lisp 
code  to  traverse  the  execution  graph  built  by  Moviola  to  gather  detailed,  application- 
specific  performance  statistics.  The  programmable  interface  is  especially  useful  for 
performing  well-defined,  repetitive  tasks,  such  as  gathering  the  mean  and  standard 
deviation  of  the  time  it  takes  processes  to  execute  parts  of  their  computation,  or  how 
much  waiting  a  process  performs  during  each  stage  of  a  computation. 

The  programmable  interface  can  also  be  used  to  create  different  views  of  an  exe¬ 
cution.  We  might  want  to  use  program  animation  to  analyze  dynamic  activity  over 
static  communication  channels,  or  application-specific  views  to  describe  the  progress 
of  a  computation  in  terms  of  the  program,  rather  than  the  low-level  view  provided 
by  Moviola.  For  performance  analysis,  the  performance  graphs  produced  by  PIE  or 
SeeCube  are  much  more  effective  than  a  synchronization  DAG.  Our  current  work  is 
using  the  programmable  interface  to  extend  the  range  of  views  for  an  execution  avail¬ 
able  to  users,  from  application-specific  views  to  detailed  performance  graphs  (Fig. 
IS). 

We  have  already  constructed  a  mechanism  for  remote,  source-level  debugging  for 
Psyche,  in  the  style  of  the  Topaz  TeleDebug  facility  developed  at  DEC  SRC.  An 
interactive  front  end  runs  on  a  Sun  workstation  using  the  GNU  gdb  debugger.  The 
debugger  communicates  via  UDP  with  a  multiplexor  running  on  the  Butterfly’s  host 
machine.  The  multiplexor  in  turn  communicates  with  a  low-level  debugging  stub  (lid) 
that  underlies  the  Psyche  kernel. 

We  have  successfully  used  this  facility  for  kernel  debugging  and  plan  to  use  it  as 
a  base  for  user-level,  multi -model  debugging.  Low-level  debugger  functions  will  be 
implemented  by  a  combination  of  gdb  and  lid.  High-level  commands  from  the  user 
will  be  translated  by  a  model  specific  interface,  created  as  part  of  the  programming 
model. 

In  addition,  debugger  have  been  implemented  to  enable  complex  debugger 

queries  and  conditional  breakpoints  during  execution.  The  toolkit  has  been  integrated 
with  an  extended  version  of  t  he  gdb  debugger,  enabling  source  level  debugging  during 
replay  of  multiprocess  programs.  The  Moviola  graphical  interface  has  been  improved, 
significantly  reducing  the  display  time  and  increasing  the  functionality.  The  S  graph- 
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Figure  18:  A  perspective  plot  of  communication  time  per  process  per  row  for  Gaussian 
Elimination,  as  produced  by  the  toolkit.  The  x-axis  corresponds  to  the  36  processes 
in  the  computation,  the  y-axis  corresponds  to  rounds  of  communication,  one  per  pivot 
row,  and  the  z-axis  is  communication  time  for  a  round.  The  plot  shows  the  increase 
in  communication  time  (caused  by  contention)  as  the  computation  progresses. 


ics  package  has  been  added  to  the  toolkit,  facilitating  graphical  displays  of  perfor¬ 
mance  data.  LISP  tools  have  been  written  for  critical  path  analysis  and  for  gathering 
and  plotting  performance  statistics.  All  displays  in  the  toolkit  can  be  reproduced  as 
hardcopy  using  Postscript  format. 

We  are  beginning  to  explore  the  relationship  between  program  analysis,  program¬ 
ming  model  (process  and  communication  semantics),  and  visualization.  We  are  in¬ 
vestigating  techniques  that  can  be  used  across  several  parallel  programming  models, 
and  a  tool  interface  that  allows  a  programmer  to  debug  using  the  primitives  provided 
by  a  particular  programming  model.  Our  goal  is  to  (a)  provide  a  framework  that 
unifies  our  approach,  as  embodied  in  our  toolkit,  with  the  various  techniques  for  pro¬ 
gram  monitoring  and  visualization  that  have  been  described  in  the  literature  and  (b) 
develop  a  methodology,  and  corresponding  tools,  for  parallel  program  analysis  that 
can  be  used  step-by-step  by  programmers  for  the  entire  software  development  cycle, 
from  initial  debugging  to  performance  modeling  and  extrapolation. 


10  Technology  Transfer 

Under  the  contract  Rochester  developed  large  amounts  of  Butterfly  applications  soft¬ 
ware,  the  Connectionist  Simulator,  and  the  Zebra/Zed  system  for  object-oriented 
register  level  programming.  The  simulator  and  Zebra/Zed  are  available  by  anony¬ 
mous  ftp  or  magnetic  media,  and  hundreds  of  copies  have  been  sent  out  worldwide. 

Rochester  has  a  substantial  Industrial  Affiliates  Program,  with  industrial  partners 
including  BBN,  GE,  Kodak,  and  Xerox.  In  the  recent  past,  we  have  had  active 
research  collaboration  in  the  areas  of  vision,  reasoning,  and  parallel  programming 
environments  with  each  of  these  affiliates.  We  have  an  annual  meeting  to  keep  affiliates 
abreast  of  our  work,  and  to  keep  them  aware  of  students  and  personnel  here  with 
whom  they  may  have  interests  in  common.  Rochester  students  normally  spend  one  or 
two  summer  terms  working  in  industry,  and  the  resulting  ties  with  IBM,  GE  Research, 
GM  Research,  AT&T,  and  Xerox  (both  PARC  and  Webster  Research  Centers)  are 
healthy  and  strong.  These  couplings  are  often  demonstrated  in  observable  product 
(the  indefinite  loan  of  the  IBM  8CE  computer  to  Fowler,  the  joint  publications  of 
Swain  and  J.L.  Mundy  of  GE  Research,  etc.). 

Rochester  participated  in  the  first  DARPA  parallel  vision  architecture  benchmark, 
and  the  resulting  applications  software  (as  well  as  the  other  programming  libraries  and 
facilities  we  have  developed),  are  disseminated  through  BBN.  Rochester’s  large  and 
well-subscribed  technical  reports  service  distributes  reports  to  hundreds  of  industrial 
and  academic  sights  monthly. 

There  is  evidence  that  scientific  papers  have  transferred  some  of  the  technology 
successfully:  the  Instant  Replay  system  was  implemented  on  Sequent  computers  by 
a  group  in  Germany,  for  example.  Through  an  international  computer  newsgroup 
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the  expertise  on  the  DataCube  pipelined  processor  is  both  shared  and  acquired.  The 
Rochester  Connectionist  Simulator  and  the  Zebra/Zed  systems  are  available  by  anony¬ 
mous  ftp.  Together  they  have  been  distributed  to  several  hundred  sites  worldwide. 


11  Thesis  Abstracts 

Several  theses  appeared  during  the  contract  period  that  were  directly  related  to  the 
contract.  Many  more  were  initiated  during  the  contract  period  and  have  been  com¬ 
pleted  since,  or  are  still  (1990)  in  process.  The  following  are  representative  of  earlier 
work  under  the  contract. 

Aloimonos,  J .,  “ Computing  intrinsic  images,”  Ph.D.  Thesis  and  TR  198,  August 
1986:  Several  theories  have  been  proposed  in  the  literature  for  the  computation  of 
shape  from  shading,  shape  from  texture,  retinal  motion  from  spatiotemporal  deriva¬ 
tives  of  the  image  intensity  function,  and  the  like.  However:  (1)  The  employed 
assumptions  are  not  present  in  a  large  subset  of  real  images.  (2)  Usually  the  natu¬ 
ral  constraints  guarantee  unique  answers,  calling  for  strong  additional  assumptions 
about  the  world.  (3)  Even  if  physical  constraints  guarantee  unique  answers,  often  the 
resulting  algorithms  are  not  robust.  This  thesis  shows  that  if  several  available  cues 
are  combined,  then  the  resulting  algorithms  compute  intrinsic  parameters  (shape, 
depth,  motion,  etc.)  uniquely  and  robustly.  The  computational  aspect  of  the  theory 
envisages  a  cooperative  highly  parallel  implementation,  bringing  in  information  from 
five  different  sources  (shading,  texture,  motion,  contour  and  stereo),  to  resolve  ambi¬ 
guities  and  ensure  uniqueness  and  stability  of  the  intrinsic  parameters.  The  problems 
of  shape  from  texture,  shape  from  shading  and  motion,  visual  motion  analysis,  and 
shape  and  motion  from  contour  are  analyzed  in  detail. 

Bandopadhay,  A.,  “A  computational  study  of  rigid  motion  perception,”  Ph.D.  The¬ 
sis  and  TR  221,  December  1986:  The  interpretation  of  visual  motion  is  investigated. 
The  task  of  motion  perception  is  divided  into  two  major  subtasks:  (1)  estimation  of 
two-dimensional  retinal  motion,  and  (2)  computation  of  parameters  of  rigid  motion 
from  retinal  motion.  Retinal  motion  estimation  is  performed  using  a  point  matching 
algorithm  based  on  local  similarity  of  matches  and  a  global  clustering  strategy.  The 
clustering  technique  unifies  the  notion  of  matching  and  motion  segmentation  and  pro¬ 
vides  an  insight  into  the  complexity  of  the  matching  and  segmentation  process.  The 
constraints  governing  the  computation  of  the  rigid  motion  parameters  from  retinal 
motion  are  investigated.  The  emphasis  is  on  determining  the  possible  ambiguities  of 
interpretation  and  how  to  remove  them.  This  theoretical  analysis  forms  the  basis  of  a 
set  of  algorithms  for  computing  structure  and  three-dimensional  motion  parameters 
from  retinal  displacements  I  he  algorithms  are  experimentally  evaluated.  The  main 
difficulties  facing  the  computation  are  nonlinearity  and  a  high-dimensional  search 
space  of  solutions.  To  alh-M.i'e  these  difficulties,  an  active  tracking  method  is  pro¬ 
posed.  This  is  a  closed  loop  -v-tem  for  evaluating  the  motion  parameters.  Under 
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such  a  regime,  it  is  possible  to  obtain  form  solutions  for  the  motion  parameters.  This 
leads  to  a  robust  cooperative  algorithm  for  motion  perception  requiring  a  minimal 
amount  of  retinal  motion  matching.  The  central  theme  for  this  research  has  been 
the  evaluation  of  a  hierarchical  model  for  visual  motion  perception.  To  this  end, 
the  investigations  revolved  around  three  primary  issues:  (1)  retinal  motion  computa¬ 
tion  from  intensity  images;  (2)  the  conditions  under  which  three-dimensional  motion 
may  be  computed  from  retinal  motion,  and  the  efficacy  of  algorithms  that  perform 
such  computations;  (3)  the  active  vision  or  closed  loop  approach  to  visual  motion 
interpretation  and  what  it  buys  us. 

Chou,  P.  B.-L.,  “The  theory  and  practice  of  Bayesian  image  labeling,”  Ph.D.  The¬ 
sis  and  TR  258,  August  1988:  Integrating  disparate  sources  of  information  has  been 
recognized  as  one  of  the  keys  to  the  success  of  general  purpose  vision  systems.  Image 
clues  such  as  shading,  texture,  stereo  disparities  and  image  flows  provide  uncertain, 
local  and  incomplete  information  about  the  three-dimensional  scene.  Spatial  a  priori 
knowledge  plays  the  role  of  filling  in  missing  information  and  smoothing  out  noise. 
This  thesis  proposes  a  solution  to  the  longstanding  open  problem  of  visual  integra¬ 
tion.  It  reports  a  framework,  based  on  Bayesian  probability  theory,  for  computing 
an  intermediate  representation  of  the  scene  from  disparate  sources  of  information. 
The  computation  is  formulated  as  a  labeling  problem.  Local  visual  observations  for 
each  image  entity  are  reported  as  label  likelihoods.  They  are  combined  consistently 
and  coherently  on  hierarchically  structured  label  trees  with  a  new,  computationally 
simple  procedure.  The  pooled  label  likelihoods  axe  fused  with  the  a  priori  spatial 
knowledge  encoded  as  Markov  Random  Fields  (MRFs).  The  a  posteriori  distribution 
of  the  labelings  are  thus  derived  in  a  Bayesian  formalism.  A  new  inference  method. 
Highest  Confidence  First  (HCF)  estimation,  is  used  to  infer  a  unique  labeling  from 
the  a  posteriori  distribution.  Unlike  previous  inference  methods  based  on  the  MRF 
formalism,  HCF  is  computationally  efficient  and  predictable  while  meeting  the  prin¬ 
ciples  of  graceful  degradation  and  least  commitment.  The  results  of  the  inference 
process  are  consistent  with  both  observable  evidence  and  a  priori  knowledge.  The  ef¬ 
fectiveness  of  the  approach  is  demonstrated  with  experiments  on  two  image  analysis 
problems:  intensity  edge  detection  and  surface  reconstruction.  For  edge  detection, 
likelihood  outputs  from  a  set  of  local  edge  operators  are  integrated  with  a  priori 
knowledge  represented  as  an  MRF  probability  distribution.  For  surface  reconstruc¬ 
tion,  intensity  information  is  integrated  with  sparse  depth  measurements  and  a  priori 
knowledge.  Coupled  MRFs  provide  a  unified  treatment  of  surface  reconstruction  and 
segmentation,  and  an  extension  of  HCF  implements  a  solution  method.  Experiments 
using  real  image  and  depth  data  yield  robust  results.  The  framework  can  also  be 
generalized  to  higher-level  vision  problems,  as  well  as  to  other  domains. 

Dibble,  P.C.,  “A  Parallel  Interleaved  File  System,”  Ph.D.  Thesis  and  TR  334- 
March  1990:  A  computer  system  is  most  useful  when  it  has  well-balanced  processor 
and  I/O  performance.  Parallel  architectures  allow  fast  computers  to  be  constructed 
from  unsophisticated  hardware.  The  usefulness  of  these  machines  is  severely  limited 
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unless  they  are  fitted  with  I/O  subsystems  that  match  their  CPU  performance.  Most 
parallel  computers  have  insufficient  I/O  performance,  or  use  exotic  hardware  to  force 
enough  I/O  bandwidth  through  a  uniprocessor  file  system.  This  approach  is  only 
useful  for  small  numbers  of  processors.  Even  a  modestly  parallel  computer  cannot  be 
served  by  an  ordinary  file  system.  Only  a  parallel  file  system  can  scale  with  the  pro¬ 
cessor  hardware  to  meet  the  I/O  demands  of  a  parallel  computer.  This  dissertation 
introduces  the  concept  of  a  parallel  interleaved  file  system.  This  class  of  file  system 
incorporates  three  concepts:  parallelism,  interleaving,  and  tools.  Parallelism  appears 
as  a  characteristic  of  the  file  system  program  and  in  the  disk  hardware.  The  parallel 
file  system  software  and  hardware  allows  the  file  system  to  scale  with  the  other  com¬ 
ponents  of  a  multiprocessor  computer.  Interleaving  is  the  rule  the  file  system  uses  to 
distribute  data  among  the  processors.  Interleaved  record  distribution  is  the  simplest 
and  in  many  ways  the  best  algorithm  for  allocating  records  to  processors.  Tools  are 
application  code  that  can  enter  the  file  system  at  a  level  that  exposes  the  parallel 
structure  of  the  files.  In  many  cases  tools  decrease  interprocessor  communication  by 
moving  processing  to  the  data  instead  of  moving  the  data.  The  thesis  of  this  disser¬ 
tation  is  that  a  parallel  interleaved  file  system  will  provide  scalable  high-performance 
I/O  for  a  wide  range  of  parallel  architectures  while  supporting  a  comprehensive  set 
of  conventional  file  system  facilities.  We  have  confirmed  our  performance  claims  ex¬ 
perimentally  and  theoretically.  Our  experiments  show  practically  linear  speedup  to 
the  limits  of  our  hardware  for  file  copy,  file  sort,  and  matrix  transpose  on  an  array  of 
oits  stored  in  a  file.  Our  analysis  predicts  the  measured  results  and  supports  a  claim 
that  the  file  system  will  easily  scale  to  more  than  12S  processors  with  disk  drives. 

Floyd,  R.A.,  “Transparency  in  distributed  file  systems,”  Ph.D.  Thesis  and  TR 
2 72,  January  1989:  The  last  few  years  have  seen  an  explosion  in  the  research  and 
development  of  distributed  file  systems.  Existing  systems  provide  a  limited  degree 
of  network  transparency,  with  researchers  generally  arguing  that  full  network  trans¬ 
parency  in  unachievable.  Attempts  to  understand  and  address  these  arguments  have 
been  limited  by  a  lack  of  understanding  of  the  range  of  possible  solutions  to  trans¬ 
parency  issues  and  a  lack  of  knowledge  of  the  ways  in  which  file  systems  are  used.  We 
address  these  problems  by:  (1)  designing  and  implementing  a  prototype  of  a  highly 
transparent  distributed  file  system;  (2)  collecting  and  analyzing  data  on  file  and  di¬ 
rectory  reference  patterns;  and  (3)  using  these  data  to  analyze  the  effectiveness  of 
our  design.  Our  distributed  file  system,  Roe,  supports  a  substantially  higher  degree 
of  transparency  than  earlier  distributed  file  systems,  and  is  able  to  do  this  in  a  het¬ 
erogeneous  environment.  Ho*'  appears  to  users  to  be  a  single,  globally  accessible  file 
system  providing  highly  available,  consistent  files.  It  provides  a  coherent  framework 
for  uniting  techniques  in  the  areas  of  naming,  replication,  consistency  control,  file 
and  directory  placement,  and  tile  and  directory  migration  in  a  way  that  provides  full 
network  transparency.  This  i  ransparency  allows  Roe  to  provide  increased  availability, 
automatic  reconfiguration.  effective  use  of  resources,  a  simplified  file  system  model, 
and  important  performance  benefits.  Our  data  collection  and  analysis  work  provides 
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detailed  information  on  short-term  file  reference  patterns  in  the  UNIX  environment. 
In  addition  to  examining  the  overall  request  behavior,  we  break  references  down  by 
the  type  of  file,  owner  of  file,  and  type  of  user.  We  find  significant  differences  in  ref¬ 
erence  patterns  between  the  various  classes  that  can  be  used  as  a  basis  for  placement 
and  migration  algorithms.  Our  study  also  provides,  for  the  first  time,  information  on 
directory  reference  patterns  in  a  hierarchical  file  system.  The  results  provide  striking 
evidence  of  the  importance  of  name  resolution  overhead  in  UNIX  environments.  Us¬ 
ing  our  data  collection  analysis  results,  we  examine  the  availability  and  performance 
of  Roe.  File  open  overhead  proves  to  be  an  issue,  but  techniques  exist  for  reducing 
its  impact. 

Friedberg,  S.A.,  “Hierarchical  process  composition:  Dynamic  maintenance  of  struc¬ 
ture  in  a  distributed  environment,  ”  Ph.D.  Thesis  and  TR  294,  1988:  This  disserta¬ 
tion  is  a  study  in  depth  of  a  method,  called  hierarchical  process  composition  (HPC), 
for  organizing,  developing,  and  maintaining  large  distributed  programs.  HPC  extends 
the  process  abstraction  to  nested  collections  of  processes,  allowing  a  multiprocess  pro¬ 
gram  in  place  of  any  single  process,  and  provides  a  rich  set  of  structuring  mechanisms 
for  building  distributed  applications.  The  emphasis  in  HPC  is  on  structural  and 
architectural  issues  in  distributed  softwaxe  systems,  especially  interactions  involving 
dynamic  reconfiguration,  protection,  and  distribution.  The  major  contributions  of 
this  work  come  from  the  detailed  consideration,  based  on  case  studies,  formal  analy¬ 
sis,  and  a  prototype  implementation,  of  how  abstraction  and  composition  interact  in 
unexpected  ways  with  each  other  and  with  a  distributed  environment.  HPC  ties  pro¬ 
cesses  together  with  heterogeneous  interprocess  communication  mechanisms,  such  as 
TCP/IP  and  remote  procedure  call.  Explicit  structure  determines  the  logical  connec¬ 
tivity  between  processes,  masking  differences  in  communication  mechanisms.  HPC 
supports  one-to-one,  parallel  channel,  and  many-to-many  (multicasting)  connectivity. 
Efficient  computation  of  end-to-end  connectivity  from  the  communication  structure  is 
a  challenging  problem,  and  a  third-party  connection  facility  is  needed  to  implement 
dynamic  reconfiguration  when  the  logical  connectivity  changes.  Explicit  structure 
also  supports  grouping  and  nesting  of  processes.  HPC  uses  this  process  structure 
to  define  meaningful  protection  domains.  Access  control  is  structured  (and  the  ba¬ 
sic  HPC  facilities  may  be  extended)  using  the  same  powerful  tools  used  to  define 
communication  patterns.  HPC  provides  escapes  from  the  strict  hierarchy  for  direct 
communication  between  any  two  programs,  enabling  transparent  access  to  global  ser¬ 
vices.  These  escapes  are  carefully  controlled  to  prevent  interference  and  to  preserve 
the  appearance  of  a  strict  hierarchy.  This  work  is  also  a  rare  case  study  in  consis¬ 
tency  control  for  non-trivial,  highly-available  services  in  a  distributed  environment. 
Since  HPC  abstraction  and  composition  operations  must  be  available  during  network 
partitions,  basic  structural  constraints  can  be  violated  when  separate  partitions  are 
merged.  By  exhaustive  case  analysis,  all  possible  merge  inconsistencies  that  could 
arise  in  HPC  have  been  identified  and  it  is  shown  how  each  inconsistency  can  be 
either  avoided,  automatically  reconciled  by  the  system,  or  reported  to  the  user  for 
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application-specific  reconciliation. 

Loui,  R.P.,  “Theory  and  computation  of  uncertain  inference  and  decision,”  Ph.D. 
Thesis  and  TR  228,  September  1987:  This  interdisciplinary  dissertation  studies  un¬ 
certain  inference  pursuant  to  the  purposes  of  artificial  intelligence,  while  following 
the  tradition  of  philosophy  of  science.  Its  major  achievement  is  the  extension  and 
integration  of  work  in  epistemology  and  knowledge  representation.  This  results  in 
both  a  better  system  for  evidential  reasoning  and  a  better  system  for  qualitative 
non-monotonic  reasoning.  By  chapter,  the  contributions  are:  a  comparison  of  non¬ 
monotonic  and  inductive  logic;  the  effective  implementation  of  Kyburg’s  indetermi¬ 
nate  probability  system;  an  extension  of  that  system;  a  proposal  for  decision-making 
with  indeterminate  probabilities;  a  system  of  non-monotonic  reasoning  motivated  by 
the  study  of  probabilistic  reasoning;  some  consequences  of  this  system;  a  convention- 
alistic  foundation  for  decision  theory  and  non-monotonic  reasoning. 

Mellor-Crummey,  J.,  “Debugging  and  analysis  of  large-scale  parallel  programs,” 
Ph.D.  Thesis  and  TR  312,  September  1989:  One  of  the  most  serious  problems  in  the 
development  cycle  of  large-scale  parallel  programs  is  the  lack  of  tools  for  debugging 
and  performance  analysis.  Parallel  programs  are  more  difficult  to  analyze  than  their 
sequential  counterparts  for  several  reasons.  First,  race  conditions  in  parallel  programs 
can  cause  non-deterministic  behavior,  which  reduces  the  effectiveness  of  traditional 
cyclic  debugging  techniques.  Second,  invasive,  interactive  analysis  can  distort  a  par¬ 
allel  program’s  execution  bej'ond  recognition.  Finally,  comprehensive  analysis  of  a 
parallel  program’s  execution  requires  collection,  management,  and  presentation  of 
an  enormous  amount  of  information.  This  dissertation  addresses  the  problem  of 
debugging  and  analysis  of  large-scale  parallel  programs  executing  on  shared-memory 
multiprocessors.  It  proposes  a  methodology  for  top-down  analysis  of  parallel  program 
executions  that  replaces  previous  ad-hoc  approaches.  To  support  this  methodology, 
a  formal  model  for  shared- memory  communication  among  processes  in  a  parallel  pro¬ 
gram  is  developed.  It  is  shown  how  synchronization  traces  based  on  this  abstract 
model  can  be  used  to  create  indistinguishable  executions  that  form  the  basis  for  de¬ 
bugging.  This  result  is  used  to  develop  a  practical  technique  for  tracing  parallel 
program  executions  on  shared-memory  parallel  processors  so  that  their  executions 
can  be  repeated  deterministically  on  demand.  Next,  it  is  shown  how  these  traces  can 
be  augmented  with  additional  information  that  increases  their  utility  for  debugging 
and  performance  analysis.  The  design  of  an  integrated,  extensible  toolkit  based  on 
these  traces  is  proposed.  This  toolkit  uses  execution  traces  to  support  interactive, 
graphics-based,  top-down  analysis  of  parallel  program  executions.  A  prototype  imple¬ 
mentation  of  the  toolkit  is  described  explaining  how  it  exploits  our  execution  tracing 
model  to  facilitate  debugging  and  analysis.  Case  studies  of  the  behavior  of  several 
versions  of  two  parallel  programs  are  presented  to  demonstrate  both  the  utility  of  our 
execution  tracing  model  and  the  leverage  it  provides  for  debugging  and  performance 
analysis. 
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Olson,  T.J.,  “An  architectural  model  of  visual  motion  understanding,”  Ph.D.  The¬ 
sis  and  TR  305,  August  1989:  The  past  few  years  have  seen  an  explosion  of  interest 
in  the  recovery  and  use  of  visual  motion  information  by  biological  and  machine  vision 
systems.  In  the  area  of  computer  vision,  a  variety  of  algorithms  have  been  developed 
for  extracting  various  types  of  motion  information  from  images.  Neuroscientists  have 
made  great  strides  in  understanding  the  flow  of  motion  information  from  the  retina 
to  striate  and  extrastriate  cortex.  The  psychophysics  community  has  gone  a  long 
way  toward  characterizing  the  limits  and  structure  of  human  motion  processing.  The 
central  claim  of  this  thesis  is  that  many  puzzling  aspects  of  motion  perception  can  be 
understood  by  assuming  a  particular  architecture  for  the  human  motion  processing 
system.  The  architecture  consists  of  three  functional  units  or  subsystems.  The  first  or 
low-level  subsystem  computes  simple  mathematical  properties  of  the  visual  signal.  It 
is  entirely  bottom-up,  and  prone  to  error  when  its  implicit  assumptions  are  violated. 
The  intermediate-level  subsystem  combines  the  low-level  system’s  output  with  world 
knowledge,  segmentation  information  and  other  inputs  to  construct  a  representation 
of  the  world  in  terms  of  primitive  forms  and  their  trajectories.  It  is  claimed  to  be 
the  substrate  for  long-range  apparent  motion.  The  highest  level  of  the  motion  system 
assembles  intermediate-level  form  and  motion  primitives  into  scenarios  that  can  be 
used  for  prediction  and  for  matching  against  stored  models.  This  architecture  is  the 
result  of  joint  work  with  Jerome  Feldman  and  Nigel  Goddard.  The  description  of  the 
low-level  system  is  in  accord  with  the  standard  view  of  early  motion  processing,  and 
the  details  of  the  high-level  system  are  being  worked  out  by  Goddard.  The  secondary 
contribution  of  this  thesis  is  a  detailed  connectionist  model  of  the  intermediate  level 
of  the  architecture.  In  order  to  compute  the  trajectories  of  primitive  shapes  it  is 
necessary  to  design  mechanisms  for  handling  time  and  Gestalt  grouping  effects  in 
connectionist  networks.  Solutions  to  these  problems  are  developed  and  used  to  con¬ 
struct  a  network  that  interprets  continuous  and  apparent  motion  stimuli  in  a  limited 
domain.  Simulation  results  show  that  its  interpretations  are  in  qualitative  agreement 
with  human  perception. 

Shastri,  L.,  “Evidential  reasoning  in  semantic  networks:  A  formal  theory  and  its 
parallel  implementation,”  Ph.D.  Thesis  and  TR  166,  September  1985:  This  the¬ 
sis  describes  an  evidential  framework  for  representing  conceptual  knowledge,  wherein 
the  principle  of  maximum  entropy  is  applied  to  deal  with  uncertainty  and  incomplete¬ 
ness.  It  is  demonstrated  that  the  proposed  framework  offers  a  uniform  treatment  of 
inheritance  and  categorization,  and  solves  an  interesting  class  of  inheritance  and  cate¬ 
gorization  problems,  including  those  that  involve  exceptions,  multiple  hierarchies,  and 
conflicting  information.  Tin*  proposed  framework  can  be  encoded  as  an  interpreter- 
free,  massively  parallel  (ronmvtionist)  network  that  can  solve  the  inheritance  and 
categorization  problems  in  *i:n«-  proportional  to  the  depth  of  the  conceptual  hierar¬ 
chy. 

Slier ,  D.B.,  “A  probalji.  •  approach  to  low-level  vision,”  Ph.D.  Thesis  and  TR 
232,  October  1987:  A  pi<- T.i-iic  approach  to  low-level  vision  algorithms  results 
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in  algorithms  that  are  easy  to  tune  for  a  particular  application  and  modules  that 
can  be  used  for  many  applications.  Several  routines  that  return  likelihoods  can  be 
combined  into  a  single  more  robust  routine.  Thus  it  is  easy  to  construct  specialized 
yet  robust  low-level  vision  systems  out  of  algorithms  that  calculate  likelihoods.  This 
dissertation  studies  algorithms  that  generate  and  use  likelihoods.  Probabilities  derive 
from  likelihoods  using  Bayes’s  rule.  Thus  vision  algorithms  that  return  likelihoods 
also  generate  probabilities.  Likelihoods  are  used  by  Markov  Random  Field  algorithms. 
This  approach  yields  facet  model  boundary  pixel  detectors  that  return  likelihoods. 
Experiments  show  that  the  detectors  designed  for  the  step  edge  model  are  on  par  with 
the  best  edge  detectors  reported  in  the  literature.  Algorithms  are  presented  here  that 
use  the  generalized  Hough  transform  to  calculate  likelihoods  for  object  recognition. 
Evidence,  represented  as  likelihoods,  from  several  detectors  that  view  the  same  data 
with  different  models  are  combined  here.  The  likelihoods  that  result  are  used  to  build 
robust  detectors. 
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